Every shot counts

Why text-to-video AI has missed the expectations of storytelling creatives.

As summer 2024 approaches, the AI gold rush has hit fever pitch. As I sit in a small coffee shop in Midtown Manhattan, typing away on my iPad, I overhear two colleagues discussing AI’s impact on their finance jobs, and the possibilities it will have on their future.

The flood of AI products and AI-driven projects entering the market in early 2024 has caused a lot of curiosity, confusion, and misconceptions.

One particular area that seems to be on the minds of creatives from New York to Hollywood is AI-generated movies. In fact, it caused enough confusion and anxiety to bring the entertainment industry to a halt during last year’s SAG-AFTRA and WGA negotiations.

Despite what seems to be a daily release of new products promising to take storytellers to the next level, most of what’s been delivered doesn’t seem to resonate with customers. This begs the question: why is AI failing our sky-high expectations? Did we buy into the hype, or was the future overpromised?

The Iteration Problem

Until now, we’ve mostly seen the ability to capture a text prompt and generate short clips or sequences of short stills or clips. While this has yielded some incredible results — things that were unimaginable as recent as 6 months ago — making tweaks to execute the user’s vision could require heavy manipulation of the output.

This requires storytellers to spend more time identifying and correcting the AI’s misses — converting skilled writers into story editors and making them, arguably, less efficient.

“Making a movie is all about iterating. It’s iteration. And if you can’t iterate on one of these, I don’t know how you would possibly use it in production,” says Craig Good, a former Pixar animator who has credits on Toy Story and Finding Nemo, in a recent interview with Quartz.

I interpret Craig’s quote as an issue with AI accurately translating his vision into a usable output. This could mean that he’s either had issues prompting the tool he’s using or, more likely, the technology hasn’t been able to recreate his intended vision.

With Good’s professional accreditations, I have to believe his writing ability puts him in a very small circle of world-class creatives. Making this assumption, I’m led to believe the AI doesn’t understand the nuance of human creativity.

But in reality, AI can often read and comprehend better than humans. So how does this combination lead to such an efficiency loss?

The Storyteller’s Dilemma

Many of the products rushing to market today are outputting high-res, short-form content based on the capabilities of AI.

The result has been a saturation of products that largely accomplish the same thing and fail to meet the needs of users like Craig.

But the goal was never to iterate every single frame. That would edge on the impossible for a full-feature film.

The goal for Craig and creatives alike is to translate their vision without the cost of lost efficiency.

This outcome is achievable, even for projects on the scale of Toy Story and Finding Nemo, but requires tools that focus on their roles and their goals — and not convert professionals to prompt editors.

As the AI summer continues to churn out new products and new avenues for creatives, we’ll begin to see an evolution in the way AI Video lifts up creatives.

“Every shot and frame counts, and these ideas throw that out the window,” a long-time animator-turned-engineer messaged me this morning, but how soon can we return efficient creative control back to the creatives?