Google DeepMind unveils Genie 3, a text-to-world AI model with real-time interactivity

6 Aug

In a LinkedIn post, Inbar Mosseri introduces Genie 3 as a step toward intelligent simulations, generating playable environments from text prompts in real time.

Google DeepMind has announced Genie 3, its latest AI world model capable of generating interactive, 3D environments directly from text.

The model allows users to navigate simulated worlds rendered at 720p and 24 frames per second with visual consistency extending up to one minute.

The update was introduced in a LinkedIn post by Inbar Mosseri, Team Lead and Senior Staff Research Scientist at DeepMind. Mosseri describes the system as a “general-purpose world model” that opens up new possibilities for generative AI and simulation-based learning.

Mosseri posted, “Excited to introduce Genie 3, our general-purpose world model that generates interactive, playable worlds directly from text. It supports real-time interaction at 720p and 24 FPS, with 1 minute of visual consistency—a big step toward truly intelligent simulations.”

Genie 3 supports interactive scenes across categories including natural environments, animated characters, historical settings, and physically reactive terrain. DeepMind says the system represents a milestone in its broader effort toward artificial general intelligence (AGI).

Applications include education, games, and robotics

Examples shared by DeepMind include immersive scenes of volcanoes, ocean trenches, Zen gardens, and simulated deep-sea life. Other prompts render first-person perspectives of real-world situations, such as walking through cities or navigating natural disasters.

In educational settings, Genie 3 could eventually serve as a simulation tool for subjects like geography, biology, or history. In robotics, the system may assist with training agents in physically grounded environments. The model also shows potential for video game development and virtual prototyping.

The system was shown generating scenarios ranging from serene landscape walkthroughs to abstract fantasy environments featuring glowing creatures, paper-folded lizards, and gravity-defying terrain.

Real-time control and long-horizon consistency

One of the key advances in Genie 3 is its ability to maintain physical and visual consistency over time, while also responding to user input in real time. This means environments can evolve based on player actions without losing spatial or narrative coherence.

The model’s architecture supports auto-regressive generation with memory, enabling it to “refer back” to user inputs or previous states from up to one minute earlier. DeepMind notes that this represents a technical challenge, as small errors in previous frames can compound over time.

According to DeepMind, Genie 3 marks a shift from video generation to environment simulation, where user interaction and continuity are central.

Featured