This week we will look at another great session from Ascend24. Last month we hosted Ascend24 in Palo Alto - a half-day AI-Driven Growth Summit which brought together industry experts, thought leaders, and practitioners to discuss strategies for maximizing customer value and driving revenue growth.
Andrew Bunner, Technologist on the Google DeepMind team, hosted a session on the topic of Advances in AI.
Advances in AI
Andrew has spent over 8 years on generative multi-modal models and most of that time he spent on classic computer vision work - teaching computers to recognize pictures. Over the last couple of years, Andrew shifted to teaching computers to draw completely new pictures.
Andrew started off by providing a comprehensive overview and history of generative multimodal models, focusing on their capabilities, limitations, and potential applications.
Understanding Generative Multimodal Models
Generative multimodal models are AI systems capable of generating new content, such as images or text, based on the information they have been trained on. Andrew explains that these models are built on transformer networks, which process data sequentially and can learn complex patterns.
The Training Process
Training data: Models are trained on massive datasets of text and images.
Masked tokens: During training, portions of the data are masked, and the model is tasked with predicting the missing parts.
Gradient descent: The model's weights are adjusted iteratively to minimize the error between its predictions and the actual data.
Knowledge compression: The model learns to compress the information in the training data into a compact representation.
Limitations and Challenges
There are multiple limitations and challenges for these models. The most common most people would have heard of or encountered themselves is hallucinations - models can sometimes generate incorrect or nonsensical content. Models may also struggle to understand the context of a query or prompt. When it comes to domain-specific questions, models may lack expertise which leads to inaccurate or incomplete responses.
Andrew then goes into details on how one can improve these models using multiple approaches such as prompt engineering, retrieval-augmented generation, and fine tuning. One would benefit the most by usually combining multiple techniques; also the efficacy of the techniques is also driven by the use cases.
Future Outlook
Continued advancements: Andrew is optimistic about the future of generative multimodal models, predicting further improvements in their capabilities.
Practical applications: These models have the potential to revolutionize various industries, from content creation to customer service.
Ethical considerations: As these models become more powerful, it is crucial to address ethical concerns related to their use, such as bias and misinformation.
Generative multimodal models are a rapidly evolving technology with immense potential. While they face challenges, ongoing research and development are addressing these limitations.
Watch the full video to learn more about Andrew, the long history of AI, and what to expect next in the journey!