The video summarizes the predictions from Google’s AI boss, Demis Hassabis, regarding the future of Artificial Intelligence in 2026, highlighting several areas where Google is positioning itself to dominate.
The key developments expected are:
- Full Omni-models and Multimodality: Hassabis predicts a strong convergence of modalities, leading to “full omni-models.” Google’s Gemini foundation model is already built to be multimodal, handling images, video, text, and audio. The image model, Nano Banana Pro, is an example of this, demonstrating sophisticated visual understanding and the ability to create accurate infographics. The ultimate goal is a stack that includes robotics, images, video, audio, 3D, and text.
- Advancements in Robotics: Google’s Gemini Robotics 1.5 is a new family of models designed to power the next generation of physical agents. These agents can solve complex, multi-step tasks (like sorting laundry or fruits) by perceiving the environment and “thinking” step-by-step. A significant feature is that all of Google’s robots can use the same model without specific fine-tuning for different form factors. These agents can also use the internet to answer questions and solve problems, such as looking up local waste guidelines for sorting trash.
- Video Generation and Live Interaction: The video highlights the anticipated progress in video models, with Google’s V3 expected to remain a leader in video generation. A key feature is Gemini Live, which combines multimodality with live speech and on-the-fly reasoning. A viral demonstration showed Gemini Live guiding a user through an entire complex task, such as a car oil change, proving its utility as a real-time, helpful AI guide.
- World Models: Hassabis is personally working on “world models,” which are expected to be a major theme in 2026. Google’s Genie 3 is an interactive video model that generates virtual worlds users can explore like a simulation or game. These worlds react to movement and actions in real-time, maintain “world memory” (where actions persist), and allow for “promptable events” (adding new characters or objects on the fly). These models are anticipated to be crucial for next-generation gaming, embodied research, and simulating complex scenarios.
- Agent-Based Systems: Google is heavily focused on developing sophisticated AI agents. Examples mentioned include:
- Co-scientist: A multi-agent system that acts as a virtual collaborator to propose and refine scientific hypotheses and research plans.
- Code-Men Agent: Developed to detect, debug, and fix security vulnerabilities in open-source codebases.
- Data Science Agent: An assistant that automates end-to-end data science work.
- Alpha Evolve: A coding agent for scientific algorithmic discovery.
The video concludes that the combination of these agents and the exponential progress in cross-modality will lead to surprising and incredible advancements from Google by 2026.
