DeepMind’s AI Just Solved Video Generation In A Way Nobody Expected

DeepMind’s Veo 3 text-to-video generative AI produces incredibly realistic footage, marking a huge leap in video generation fidelity.

The AI demonstrates an advanced, inherent understanding of physics, light transport, and material properties, generating consistent reflections and specular highlights. The most surprising discovery is that many of its advanced capabilities, such as image inpainting, outpainting, and segmentation, are emergent: the AI learned them autonomously from training on vast video data, rather than being explicitly programmed. The authors call this frame-by-frame reasoning process the “chain of frames”. Despite its power, the model is not flawless and still makes logical errors or fails simple IQ tests.