Gemini Omni Is Here — Everything You NEED to Know

Google just announced Gemini Omni at I/O 2026, and it’s the company’s biggest move in AI video yet. Gemini Omni is a new family of models from Google DeepMind built on one simple idea: a single model that can create anything from any input, starting with video. The first version, Gemini Omni Flash, is available now and has already replaced Veo as the default video model inside the Gemini app. If you’ve used Nano Banana for images, Google is essentially calling this Nano Banana for video.

The headline feature is multimodal input. You can feed Omni any combination of text, images, audio and video and it pulls them into a single finished video generation, with support for up to five reference photos so characters, objects and locations stay consistent. The feature getting the most attention is editing: with Omni you edit video through conversation, giving instructions one after another, with each change building on the last so your characters and scenes stay consistent. You can even take real footage you filmed yourself and ask Omni to change what’s happening inside it.

All of it is grounded in Gemini’s improved world knowledge, with a better understanding of physics like gravity, momentum and fluid dynamics, so scenes hold together more realistically. Omni also generates audio natively and introduces AI avatars, reusable digital versions of yourself that can appear in your videos. Gemini Omni Flash is rolling out now for Google AI Plus plans and higher across the Gemini app, Google Flow and YouTube Shorts, with free access coming to YouTube Shorts and YouTube Create later this week. Clips currently cap at 10 seconds, every generation carries a SynthID watermark, a developer API is coming within weeks, and Google has already teased a more powerful model called Omni Pro.

Create AI Video with ElevenLabs