Syntax Sunday: Runway ML's Gen 3 Alpha Text-to-Video Model

Runway's Gen 3 Alpha Intro

In this edition of #SyntaxSunday, we will explore Runway's Gen-3 Alpha model, their latest video model, which offers cutting-edge text-to-video capabilities. It is currently one of the best (if not the best) text to video generation models currently available and has very similar capabilities to OpenAi's Sora, which has not been released yet.

Given that this is the Alpha release, we need keep in mind that this version serves as the initial iteration and is incomplete. There are likely bugs and things may not work correctly yet. This does cost you some money to use though, which is explained in the next section

I will show examples of the prompts I used to generate a few videos. I will also showcase some videos using the prompts from the Runway ML blog post.

Pricing

Pricing is a little bit funky, as you need to purchase a monthly or yearly plan. Once you purchase a plan you are given credits which can be used to generate videos.

Gen 3 Alpha costs 100 credits per 10 second generation ($1 USD/10 second video) , so it is quite expensive, which is to be expected.

You can also purchase more credits if you run out. $10 USD = 1000 credits = 10 -> 10 seconds videos.

There is an unlimited plan, but there is a caveat as after you run out of credits, video will be generated using Explore mode which is slower. Learn more here...

What is Gen 3 Alpha?

Runway's Gen-3 Alpha is the cool new kid on the block in text-to-video tools, taking visual content creation to a whole new level. It is the next iteration in their ongoing research project titled: General World Models.

It's all about creating videos that look super real and can be tailored just the way you like it, all from simple text prompts. Gen 3 Alpha currently only has Text to Video.
Gen-3 Alpha is trained on both videos and images, it is supposed to be much better at getting those small details right in your scenes by using detailed captions for smooth transitions between elements.

As well they mention that it excels at generating expressive human characters with a wide range of actions, gestures, and emotions. So we will give that a try!

Gen 2

Previously, Runway ML's best video model was the multimodal Gen 2 model. As it is multimodal you can use either:

Text to Video
Image to Video
Image + Text to Video
Plus other customization options

I have not used Gen 2 extensively so I cannot comment on it too much, but it creates 4 second video clips, which you can extend up to 16 seconds total. In the few that I did create, they really lacked motion compared to Gen 3 Alpha.

Runway ML Example Prompts

Lets first try an example prompt from the Runway ML video tutorial.

A high-speed wide FPV shot approaches a rocky Seaside Cave, enters the cave, and emerges in an Arctic landscape with glaciers and snowcapped mountains, hyperlapse cinematography

Next lets try a few of the example prompts from the official blog post

A tsunami coming through an alley in Bulgaria, dynamic movement.

A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.

An empty warehouse dynamically transformed by flora that explode from the ground.

Close up shot of a living flame wisp darting through a bustling fantasy market at night.

Human Examples

An astronaut walking between stone buildings.

A close up portrait of a woman lit by the side, the camera pulls back.

An older man playing piano, lit from the side.

Syntax Sunday Examples

Here are a few of the example I tried... For the most part I followed the Gen 3 prompting guide.

Wide angle shot: A realistic band plays on a stage at a rock concert. The band is composed of the mythical creature Bigfoot. The scene takes place at night as fireworks go off and alien saucers fly over head.

Handheld shaky cam: In a dense, misty forest, the camera wobbles as it captures a large, hairy Bigfoot emerging from behind a tree. The creature looks directly at the camera, then starts running toward it. Additional details: Diffused lighting casts an eerie glow. Rustling leaves and bird calls enhance the atmosphere. As Bigfoot gets close, the camera falls to the ground, capturing a final terrifying glimpse of the creature.

High-speed wide FPV shot: A group of renegades drive through a desert toward an army of 300 spartan warriors in formation. 300, spartan, mad max, cinematography

Over the shoulder view from a Spartan soldier as they see a large a group of soldiers approaching. At the end Spartans raise their swords and cheer. Slow motion, Cinematic, Grows

An older gentleman winks as he walks through the cobblestone streets of an old Mediterranean city. The camera zooms out, 50mm lens.

An older gentleman winks at the camera. He is walking through the cobblestone streets of an old Mediterranean city. Zooms out, Cinematic.

A woman is walking a puppy dog on a leash. As they walk the puppy grows into a fully grown mature dog. Home video VHS, Realistic documentary, tracking

A litter of puppies as the mother licks them, Macro cinematography, side lit, zoom out

Hyperlapse shot through space and planets. An alien space ship passes by with a rock concert being performed by large alien figures. Intense lighting, Emerges, Cinematic.

Timelapse of the Big Bang as it transitions to the year (2024).

And finally, my favorite!

A neon sign that says "Syntax Sunday" on a busy city street at night. Zoom out, Wide angle.

Thoughts

Gen-3 Alpha is quite impressive, especially in terms of the quality, yet a few issues currently limit its utility. As you can see, some videos turn out great and others are garbage!

Here are a couple things I liked:

The video generations are pretty quick, usually only taking a couple minutes for a 10 second clip.
The video quality is great (720p) and you are able to export/download all your videos.
It does excels in creating lifelike human characters with diverse actions, gestures, and emotions for rich storytelling.
It also does a great job with translating text into compelling visuals.

There are definitely some issues with more complex prompts and from time to time it just does not work. With enough iterations for more simple prompts, you should be able to get something pretty close to what you want, but it is going to cost you. The biggest hurdle right now is the cost, but this will come down in the future!

Here are a couple issues I noticed:

It seems to struggle with complex prompts and not follow instructions very well. The level of detail remains consistent, but its behavior tends to be unpredictable.
At times, the detail varies greatly in quality. It seems to have trouble with non-human characters. For example, I could never get an "Alien" to work out right... (see the previous examples)
I had maybe 1 or 2 videos, look as good as the examples from the Runway blog post. They likely ran these example prompts multiple times, and picked the best ones.

Overall it is fun to use and you can create some interesting videos. As this is only the Alpha version they are likely improving it as more people use it and they gather feedback.

You can definitely create some cool videos if you are willing to pay, have an imagination, and some free time!

Next Steps

If you want to try out Gen 3 Alpha, create an account at: https://runwayml.com/.

If you have any questions about the video generations or using Gen 3 Alpha contact me at: [email protected].

Syntax Sunday

PreviousSyntax Sunday: Your Typical Dev Shop NextLet's See If I Can Break Google's Deep Research!

Last updated 1 year ago

Was this helpful?