FEATURE: Eye on AI

Kidscreen concludes its series on artificial intelligence in the children's entertainment industry with a short film exercise.

November 3, 2023

By: Evan Baily

In 2016, filmmaker Oscar Sharp and creative technologist Ross Goodwin fed a stack of sci-fi screenplays into an AI, which generated a script for a short film titled Sunspring. Sharp then gathered a small cast and crew and shot it in one day. The screenplay is disjointed and confusing, but the film is a quirky master class in acting and directing, as director and cast imbue the nonsensical lines with subtext, meaning, comedy and emotion through performance, staging and camera.

In the years since, generative AI has gone from exotic to mainstream. ChatGPT, which is much more complex than the model Goodwin and Sharp used, had 1.6 billion visits in June 2023, and parent company OpenAI has even more sophisticated language models up its sleeve. Runway AI, whose tools were used in the VFX pipeline for Everything Everywhere All At Once, just held its first annual AI Film Festival. And Thank You For Not Answering, a gorgeous short made with Runway’s Gen2 image-to-video tool, earned a glowing write-up in The New Yorker.

With all of that in mind, I set out to make an animated short using AI for this column. What follows is a chronicle of what happened. If you want to dive deeper, the assets I created, plus my detailed notes and the finished short, are all archived in a Google Drive folder at bit.ly/47MEvdl.

The plan

To keep this project manageable, I decided that the short would consist of one simple gag: A MAN tells his (ordinary-looking) DOG to sit. Dog exits frame for a beat, then re-enters walking on two legs and carrying a chair. Dog places the chair on the floor and sits in it like a person. Tiny beat, then Dog crosses its legs like a human would. FIN.

To create the shots for the film, I resolved to try out Runway’s Gen1 and Gen2 video generation tools and pick whichever one was best for the project.

Gen1

Gen1 is Runway’s video-to-video tool. Feed it footage plus a “style reference” image, and it re-renders the original shot to look like the image. Think of it as a souped-up Snapchat filter, or an off-the-shelf motion-capture setup that doesn’t require special equipment or rigged models.

After a little bit of testing (see Google Drive folder), I ruled out Gen1. It felt like I’d be making this too easy—I’d be able to shoot material with the actual composition, blocking and performance I wanted, and then just modify the appearance of the footage.

Gen2

Gen2 is Runway’s text/image-to-video tool. Give it a text prompt and/or an image, and it renders a 4.5-second shot, which you can then extend up to three times. After some testing, I found that Gen2 wasn’t as reliable or consistent as Gen1. For example, given the prompt “camera cranes down from a high vantage point and finds a cute cartoon dog running through a field,” it returned a low-angle tracking shot of a cartoon dog floating/pulsing through a field. Not a slam dunk, but not a complete bust, either. I decided to give Gen2 a try.

The gag

Since Gen2 isn’t great at following text prompts (especially camera direction), I needed reference images to ensure continuity, consistent character models and proper shot hookups. I used Midjourney to generate images for Man and Dog, created a separate face for Man (since the full-body image of him was faceless), and comped them into a separately generated background to create a still of the master two-shot.

I tried feeding that reference image to Gen2 in conjunction with different text prompts, but it kept swapping the positions of Man and Dog, making Man face away from the (virtual) camera, and introducing other distracting elements. (Later, I figured out that Gen2 will actually adhere to a reference image, regardless of text prompt, if you make sure that the reference image rather than the text prompt is visible in the input window when you hit the “Generate” button.)

After trying multiple times to get Man and Dog properly positioned in the shot and Dog walking out of frame, I realized that Gen2 couldn’t pull it off. New plan: I’d render out individual components of the shot, starting with Dog turning around, and then comp them together. That’s where things started to really go sideways.

This time, Gen2 started with the reference image, but then it inexplicably cut away to another angle of the dog, which morphed into a headless four-legged monstrosity straight out of my childhood nightmares. When I tried to extend the shot to see what would happen next, Gen2 started the render, then informed me that the abomination it had created violated its content guidelines, and my account was at risk of being suspended. Cue the sad trombone.

The pivot

This just wasn’t working. Would a different design for Dog help? Using Midjourney again, I generated some retro ’50s-style takes on Dog (pictured). Just to see what would happen, I fed a few of them into Gen2 without text prompts. Right away, I started seeing interesting results.

When I uploaded a pic of a dog sitting at a table, Gen2 made the camera crane up; the dog turned into a puppy and then into a teacup sitting in a saucer full of dog food; then a picture frame containing a human/dog hybrid wearing a frumpy dress appeared. It got even weirder from there.

And that’s when I realized I needed to stop fighting Gen2 and resenting it for what it does badly—and instead study what it does well and lean into that.

So I started generating more retro ’50s-style images featuring dogs and then letting Gen2 mess with them. The resulting shots were wonderfully surreal. You can see some of them, incorporated into the finished short, at bit.ly/3qyrbs0.

Familiar & alien

In retrospect, what I needed to do was right there in front of me all along. Sunspring embraces the inherent weirdness of AI-generated material, but weaves in human choices to create something that’s familiar and alien at the same time.

And when I went back and re-watched Thank You For Not Answering, I realized it was cut from the same cloth: morphing faces, bodiless heads, foreground objects merging into backgrounds, backgrounds calving off pieces of themselves into the foreground—and yet these surreal elements hold together as a film because of the way Trillo, the filmmaker, curated and assembled the footage and tied it together with the monologue that runs over the whole thing. In The New Yorker, Trillo says the AI was “co-directing” with him. Rather than fighting it, he embraced its strangeness but wrapped it in his humanity.

That’s my biggest takeaway here: Filmmaking isn’t technology-agnostic. What we make will always be inflected by how we make it. In the case of AI, rather than forcing it to do the stuff we’ve done in the past, I’m interested to see what effects we can achieve by guiding—but also embracing—its alienness.

Thanks for reading this! It’s the fourth and final installment in our limited AI series. I hope this and the other pieces were interesting and useful to you, and I’d love to hear any feedback you want to share (ai@conbail.com).

Evan Baily is a TV/film producer and showrunner who also consults for entertainment, media, consumer products and tech companies. Please send feedback to ai@conbail.com.

This story originally apppeared in Kidscreen‘s October/November 2023 magazine.