Insane Text2Vid: Generating Movies from Minor Prompts
Text2Vid is getting more and more insane. Learn how to generate movies from minor prompts using a diffusion model and GPT4. Discover how you can turn lots of movies back into scripts and create a dataset to train the other way.
tobi lutke
@Shopify CEO by day, Dad in evening, hacker at night. Aspiring comprehensivist. (tweets auto delete) retweet/like=noteworthy share, not endorsement
-
I just clued in how insane text2vid will get soon. As crazy as this sounds, we will be able to generate movies from just minor prompts and the path there is pretty clear.
— tobAI lutke (@tobi) March 29, 2023 -
Whisper allows very good transcription of existing videos and movies. Speaker detection is lacking but minor problem.
— tobAI lutke (@tobi) March 29, 2023
CLIP and Blip-2 are very good at extracting scene descriptions from still images, so you can also get set design, shot description and color grading. -
When you feed scene transcript + scene description into GPT4 and prompt it to turn it into a movie script you get very good results. Also there are lots of movie scripts for real cinema floating around the internet which gets you ground truth.
— tobAI lutke (@tobi) March 29, 2023 -
After turning lots (all?) movies back into scripts you have a dataset that you can train the other way diffusion model style.
— tobAI lutke (@tobi) March 29, 2023 -
Movie can be generated with dummy actors and then replaced with fitting LORA finetuned virtual actors in a post processing pass
— tobAI lutke (@tobi) March 29, 2023 -
So, soon you will be able to describe a scene, get a movie script to edit, assign virtual actors, add a cinematographic direction and sound design prompt, and get a full draft movie back over night. Further editing can be structured as a chat.
— tobAI lutke (@tobi) March 29, 2023