The incredible power of AI created media – illustrating an AI created story with DALL·E 2

The future is here

I’ve been interested in AI and computer generated media forever. As a kid, I created a stupidly naïve “AI” on my Commodore 64. I just programmed text responses to hundreds of inputs. “Hello” gave back “Hello, how are you.”. “I’m fine” gave back, “Great, me too.” and so on. I proudly showed my parents how smart my computer was.

A hollow illusion then, reality today. From Eliza to Deep Blue, Watson to Siri, between Deepfakes and GPT-3 (and friends), it’s all coming together to change our world at a blinding pace.

Deepfake test I did on my local computer a few years ago with DeepFaceLab

Do you know what two computer things I’ve been playing? Eldin Ring and OpenAI’s GPT-3 Playground. I’ve spent about $70 on each last month. Does spending cash on AI seem weird?

It’s no exaggeration to say playing around with it is one of the most engrossing and creative experiences I’ve had with computers to date.

OpenAI Inc/OpenAI LP are the big dogs in the field and carefully police usage. You can probably think of dozens of ways this technology (both the text creation and text to image tech of DALL-E) could be used for evil, spam, and misinformation.

Recently I’d been playing around with DALL·E mini and was in the process of setting up my own local server to try to allow higher quality work when I was granted access to the holy grail: DALL·E 2.

Let’s have the AI generate images for my old text-only game

In 1989 I started work on the BBS door-game Legend Of The Red Dragon. It’s a text game. What would happen if I took text from that game and asked AI to draw pictures for it from only the game text?


Let’s try its opening text as a prompt:

“You wake up early, strap your Short Sword to your back, and head out to the Town Square, seeking adventure, fame, and honor.”

Huh. Looks like it could use more direction. Let’s add “In the style of 3d rendered adventure game.”

“You wake up early, strap your Short Sword to your back, and head out to the Town Square, seeking adventure, fame, and honor. In the style of 3d rendered adventure game.”

Not bad. How about the Red Dragon Inn? Wonder how long these text prompts can be, let’s try the whole thing.

“You enter the inn and are immediately hailed by several of the patrons. You respond with a wave and scan the room. The room is filled with smoke from the torches that line the walls. Oaken tables and chairs are scattered across the room. You smile as the well-rounded Violet brushed by you…”

Well, the raw weird prose doesn’t seem to work that well. It isn’t given enough information to know it isn’t modern day. What if I change it around a little bit… (in theory you could use AI to rewrite the sentence to not be 1st person and add keywords to help with theme and historic era)

Note: I blocked out a woman’s face, I thought the rule was we can’t show them – but maybe we can, need to check the dalle-2 rules again.

“A painting of the medieval Red Dragon Inn. The room is filled with smoke from the torches that line the walls. Oaken tables and chairs are scattered across the room. Violet the barmaid smiles.”

Let’s try a different visual style.

“A photo of the medieval Red Dragon Inn. The room is filled with smoke from the torches that line the walls. Oaken tables and chairs are scattered across the room. Violet the barmaid is working.”

Hmm, it’s obvious that I could get better results if I took more care in the prompt text, but nifty anyway.

I could see it being fun to play old text games with AI generated images. I don’t see how to control Dall-e 2 with an API at the moment otherwise I might try modifying an infocom interpreter to automatically fetch them during play.

The 10-20 seconds to generate an image wouldn’t be fun to do it live, but how cool would it be to see “a bucket is here” and it appears/disappears from the image as you play the game?

The big problem is uniformity of style – but there are some tools dealing with this I haven’t played with yet. (starting with an uploaded photo, for example)

Let’s use AI for everything

How about using AI to help generate a brand new story, then illustrating it too?

Here is a test. The text with the white background I typed. The text with the green background was generated by AI. (Specifically, OpenAI.com’s text-davinci-002 engine)

Ok, we now have two characters. Now, we keep this text and continue with more prompts, interactively pulling out more details. We can always undo and try different prompts if needed.

Ok, now let’s send these descriptions to DALL·E 2 to create AI-based visual representations of the story the AI created. First let’s do Feival’s house:

“Feivel’s house is small but cozy. It is made of sticks and stones, with a thatched roof. There is a small fireplace in one corner, and a bed in another. A few shelves hold some of Feivel’s belongings, including his treasured map of the area.”

Not bad. I like the sixth image because I can see the treasure map on the chair.

Let’s do the Thimble description next. This time I’ll add “Realistic photo” at the end to specify the kind of image we want.

Thimble is an elderly mouse with gray fur. She is small and frail, but her eyes are bright and full of wisdom. She wears a simple dress and a scarf around her neck. She walks with a cane, but despite her age, she is still quite spry. Realistic photo.

Hmm. The cane didn’t seem to quite make it. This story seems like it might make a good children’s book. Let’s add “by Richard Scarry” to get that type of art style.

“Thimble is an elderly mouse with gray fur. She is small and frail, but her eyes are bright and full of wisdom. She wears a simple dress and a scarf around her neck. She walks with a cane, but despite her age, she is still quite spry. By Richard Scarry.”

Definitely got a children’s book style! The cane is now in every picture. I like this style.

I can ask for more variations in this style:

Writing a story with the characters the AI created

Hmm. Ok, we’ve got our stars, let’s have the AI write a story using them. I’m adding “Write an amusing children’s book about the above characters with a twist ending. Chapter 1:” to the end of the text we’ve already generated. (Again, green parts were created by the AI)

Well, uh, it’s a story. There are ways to coax out longer and more interesting things but this is fine for this test. Just for fun, let’s see if we can create artwork for the amazing battle scene of the giant mouse trap catching cats. I’m going to cheat and use my own descriptions for the prompt.

“Evil cats that wear clothes being cause in a giant mouse trail as a tiny clothed hero mouse strikes a victory pose in detailed colored pencil”

Uh, ok, obviously that prompt isn’t great as it looks like a cat is being hit with colored pencils. I’m showing you my failures, not just the good ones here! Let’s forget the mouse and just focus on the cats and the mouse trap.

“Evil cats being caught in a giant mousetrap, in surrealistic art style.”

These are pretty wild! Some of the faces are .. I don’t know, it may have actually tried to drawn them injured by a mousetrap, in retrospect this could trigger unintentionally gory results, especially if I used ‘photorealistic’ as a keyword.

Let’s move to safer ground and create an image for the happy (?) ending.

“An old clothed grandma mouse with a cane holding hands with a brave little boy mouse . Art by Richard Scarry”
The end!

Random fun with DALL·E 2

These are just various pictures created with DALL·E 2 and the text prompts used. It’s very interesting to see AI interpretations. Special thanks to Kevin Bates for brainstorming these prompts with me. It’s addicting, I can’t stop!

Note: The six images pic shows the prompt used, then I do some “closeups” of the more interesting ones. It’s really fast to do it this way, sorry it’s not nicer so each little pic is clickable.

“portable open source video game system”
Not real sure about this D-PAD design
Don’t steal these designs, Nintendo
“the droids from the movie starwars”

“R2D2 with arms and legs giving a high-five, zoomed out, photo”

“Ewok from the movie Return of the Jedi in a bikini”

“surrealistic photo of a puppy waring a VR helmet in a futuristic spaceship”

“the abstract concept of having free will”
“fisher price guillotine”
“golden gate bridge in the style of an oriental scroll”

In Summary..

Well, I’ve put way too many pictures in this post so I’ll end it here. The AI models I used are top of the line and have many usage restrictions, but it’s only a matter of time before similar things are available to everyone – Good or evil, unrestricted. I’m simultaneously excited and worried.

If you want to play around with generating images yourself, try DALL·E mini. Its output isn’t as impressive but it’s still fun and interesting to play with.