Mar 29, 2023

Insane Text2Vid: Generating Movies from Minor Prompts

Text2Vid is getting more and more insane. Learn how to generate movies from minor prompts using a diffusion model and GPT4. Discover how you can turn lots of movies back into scripts and create a dataset to train the other way.

AI MACHINE_LEARNING MEDIA

tobi lutke

@Shopify CEO by day, Dad in evening, hacker at night. Aspiring comprehensivist. (tweets auto delete) retweet/like=noteworthy share, not endorsement

Member of Shopify

I just clued in how insane text2vid will get soon. As crazy as this sounds, we will be able to generate movies from just minor prompts and the path there is pretty clear.
— tobAI lutke (@tobi) March 29, 2023
Whisper allows very good transcription of existing videos and movies. Speaker detection is lacking but minor problem.
CLIP and Blip-2 are very good at extracting scene descriptions from still images, so you can also get set design, shot description and color grading.
— tobAI lutke (@tobi) March 29, 2023
When you feed scene transcript + scene description into GPT4 and prompt it to turn it into a movie script you get very good results. Also there are lots of movie scripts for real cinema floating around the internet which gets you ground truth.
— tobAI lutke (@tobi) March 29, 2023
After turning lots (all?) movies back into scripts you have a dataset that you can train the other way diffusion model style.
— tobAI lutke (@tobi) March 29, 2023
Movie can be generated with dummy actors and then replaced with fitting LORA finetuned virtual actors in a post processing pass
— tobAI lutke (@tobi) March 29, 2023
So, soon you will be able to describe a scene, get a movie script to edit, assign virtual actors, add a cinematographic direction and sound design prompt, and get a full draft movie back over night. Further editing can be structured as a chat.
— tobAI lutke (@tobi) March 29, 2023