Jun 21, 2023

Discovering New Capabilities of GAIA-1

We’ve only had access to GAIA-1 for a few weeks and are discovering new capabilities every day. The results are phenomenal! GAIA isn't just a generative video model, it is a world model, ie. controllable by video, text & action prompts.

AI ALGORITHM COMPUTERS

Alex Kendall

CEO at @wayve_ai teaching cars how to drive with machine learning. 🇳🇿

Autonomous Transportation community icon

Member of Autonomous Transportation

We’ve only had access to GAIA-1 for a few weeks and are discovering new capabilities every day. The results are phenomenal!

GAIA isn't just a generative video model, it is a world model, ie. controllable by video, text & action prompts

Why is this huge for self driving? Thread: pic.twitter.com/K0Sw6rUu8V
— Alex Kendall (@alexgkendall) June 21, 2023
A world model is a generative model capable of predicting what happens next depending on the action we take, as demonstrated in this first video.@ylecun argues it is the crucial architectural component to unlock machine intelligence https://t.co/MbJPJH3zkG
— Alex Kendall (@alexgkendall) June 21, 2023
We have been working on world models for autonomous driving for 6 years, starting with this early result in 2018: https://t.co/kZFiKDGy4D
— Alex Kendall (@alexgkendall) June 21, 2023
GAIA brings together the latest advances in self-supervised learning, large language models and generative AI and represents a step-change in capabilities. The results are a world model we can control with video, language and action to explore scenes of extraordinary diversity: pic.twitter.com/WE6n5GxBeR
— Alex Kendall (@alexgkendall) June 21, 2023
1/ We can prompt the world model to predict multiple, diverse futures from a video sequence: pic.twitter.com/LqkwjKZFCj
— Alex Kendall (@alexgkendall) June 21, 2023
2/ We can generate a scene from a text prompt, like this video which was generated from scratch with the prompt, “Going around a stopped bus”: pic.twitter.com/C2uV6t6cFt
— Alex Kendall (@alexgkendall) June 21, 2023
3/ And finally, we can action-condition the model, allowing us to drive within our imagination, eg. with a sinusoidal input command to zig zag through the world.

Remarkable to see GAIA-1 generate robust scenes significantly outside of the on-road training distribution! pic.twitter.com/WqcizfkEj2
— Alex Kendall (@alexgkendall) June 21, 2023
Why is this important?
- Safety: providing explicit ability to reason about the impact of our actions
- Intelligence: allowing richer understanding of dynamic scenes
- Training: unlocking model-based policy learning
- Simulation: enabling exploration of videos in closed-loop
— Alex Kendall (@alexgkendall) June 21, 2023
Read more in our blog here, or follow @Jamie_Shotton, @wayve_ai and myself for more updates: https://t.co/zmknSwzzvz
— Alex Kendall (@alexgkendall) June 21, 2023