Category Archives: Development/RTsoft

Random stuff I’m working on.

A blog post detailing my obsessive dive into generative AI

An image that says "Seth's AI tools" next to a bad-ass skeleton that was generated by stable diffusion

Over the last few months I created a fun little toy called Seth’s AI Tools (creative name, huh?), it’s an open source Unity program that has become a playground for me to test a mishmash of AI stuff.

If you click the “AI Paintball” button inside of it, you get the thing shown in the youtube video above.

This shitty game proof of concept generates every character image sprite immediately before it’s used on-screen based on the subject entered by the player. None of the art is included in the download. (well, a few things are, like the forest background and splat effects – although I did make them with this app too)

It’s 100% local and does not use any internet functionality. (behind the scenes, it’s using Stable Diffusion, GFPGAN, ESRGAN, CLIP interrogation, and DIS among other ML/AI stuff tech)

If I leave this running for twelve days, it will have generated and displayed over one million unique images during gameplay.

What can generative art bring to games?

Well, I figured this test would be interesting because having AI make unlimited unique but SIMILAR images of your opponent & teammates and popping them up randomly forces your brain to constantly make judgement calls.

You can never memorize the art patterns because everything is always new content. Sounds tiring now that I think about it.

If you don’t shoot an opponent fast enough, they will hit you. If you hit a friendly, you lose points.

Random thought: It might be interesting to render a second frame where I modify the first image and force a “smile” on it or something, but the whole thing looks like a bad flash game and I got kind of bored of working on it for now.

The challenge of trying to use dynamic AI art inside of a game

It’s neat to type in “purple corndog” and get a brand new picture in seconds. But as far as gamedev goes, what can you really do with a raw AI created image on the-fly?

Uhh… I guess you could…

  • Show pictures in a frame on a wall
  • Simple art for a “find the matching tiles” or a match three game
  • Background art, for gameplay or a title screen
  • Texture maps (can be tiled)

Your options are kind of limited.

To control the output better, one trick is to start with an existing image, and use a mask to only generate new data in certain parts. In this way, you have a lot more control, for example, you could only change someone’s shirt, and not touch their face.

I used this technique for my pizza screensaver test – I generated a pizza to use as a template once, then asked the AI to only fill in the middle of it (inpainting) without touching the outer crust. This is why every pizza has the same crust.

It works pretty well as I can hardcode the alpha mask to use so it’s a nice circle shaped sprite, don’t have to worry about shapes and edges at all. (see video below)

The “pizza” button in Seth’s AI tools. Every single pizza is unique and generated on the fly.

But with a newer technique called Dichotomous Image Segmentation that I hacked in a few days ago I can now create an alpha masked sprite dynamically in real-time. (A sprite being an object/creature image with a transparent background)

Using DIS works much better than other tests I did trying to use chroma or luma keying. It can pick up someone in a green shirt in front of a green background, for example.

It’s a generally useful thing to have around, even if it isn’t perfect. (and like with everything in this field, better data from more training will improve it)

This video shows a valid use: (I call it “removing background” in the video below, but it’s the same thing)

This shows how the “remove background” button works NOT in the game

Now moving on to the AI Paintball demo.

This isn’t a Rorschach ink blot test, it’s the starting shape I use to create all the characters in the AI Paintball test.

This image is the target of inpainting with a given text prompt, the background is removed (by creating an alpha mask of the subject) and voilà, there’s your chipmunk, skeleton, or whatever, ready to pop-up from behind a bush.

A note on the hardware I’m using to run this

I’m using three RTX 3090 GPUs, this is how I can generate an image per second or so. This means simply playing this game or using the pizza screen saver uses 1000+ watts of power on my system.

In other words, it’s the worst, most inefficient screen saver ever created and you should never use it as one.

If you only have one GPU the game/pizza demo will look much emptier as it will be slower to make images. (this could be worked around by re-using images but this kind of thing isn’t really for mass consumption anyway so I didn’t worry it)

Oh, want to run my AI Tools server + app on your own computer?

Well, it’s a bit convoluted so this is only for the dedicated AI lovers who have decent graphic cards.

My app requires that you also install a special server, this allows the two pieces to be updated separately and offload the documentation on installing the server to others. (it can be tricky…)

There are instructions here, or google “automatic1111 webui setup tutorial for windows” and replace where they mention https://github.com/AUTOMATIC1111/stable-diffusion-webui with https://github.com/SethRobinson/aitools_server instead.

The setup is basically the same as my customized server *is* that one, just with a few extra features added as well as insuring that it hasn’t broken compatibility with my tools.

The dangers of letting the player choose the game subject dynamically

The greatest strength and weakness of something like this is that the player enter their own description and can shoot at anything or anyone they want.

A shirtless Mario, something I created as an, uh, example of what you shouldn’t do. Unless that’s your thing, I mean, nobody is going to know.

Unfortunately, stable diffusion weight data reflects the biases and stereotypes of the internet in general because, well, that’s what it’s trained on. Turns out the web has become quite the cesspool.

Tim Berners-Lee would be rolling in his… oh, he’s still alive actually, really underscores how quick everything has changed.

The pitfalls are many: for example, if someone chooses the opponent “terrorist”, you can guess what ethnicity the AI is going to choose.

Entering the names of well known politicians and celebrities work too – there is no end of ways to create something offensive to someone with just a few keystrokes.

Despite being a silly little tech demo nobody will see I almost changed the name to “Cupids Arrows” where you shoot hearts or something in an effort to side-step the ‘violence against X’ issue but that seemed a bit too… I don’t know, condescending and obvious.

So I went with a paintball theme as a compromise, at least nobody is virtually dying now.

The legality of AI and the future

Well, this is my blog so I might as well put down some random thoughts about this too.

AI image generation is currently in the hot seat for being able to mimic popular artist’s style and create copyrighted or obscene material easier than ever before. (or for a good time, try both at once)

The stable diffusion data (called the weights) is around 4 GB, or 4,294,967,296 bytes. ALL images are created using only this data. It’s reportedly trained on 2.3 billion images from just around the internet.

Assuming that’s true, 4,294,967,296 bytes divided by 2.3 billion is only two bytes per image on average. *

Two bytes is enough space to store a single number between 0 and 65535) . How can all this be possible with only one number per image?! Well, it’s simple, it’s merely computing possibilities in noise space that are tied to tokens which are tied to words and … uh.. it’s all very mathy. Fine, I don’t really get it either.

This data (and code to use it) was released to the public for free and is responsible for much of the explosion we’re seeing now.

Our copyright system has never had to deal with concepts like “AI training”. How would it ever be feasible to get permission to use 2.3 billion images, and is it really necessary if it results in only a few bytes of data per each?

I’m hoping legally we end up with an opt-out system instead of requiring permission for all training because keep this mind: If you want to remove someone from a picture or upscale it, it will do the best job if it’s been trained on similar data. Using crippled data sets will make things less useful across the board.

To remove the birdy, the AI has to understand faces to fill in the missing parts.

Copyright as it applies to AI needs to evolve as fast as the technology, but that’s unlikely to happen. We have to find the balance in protecting IP but also not at the cost of hamstringing humanity’s ability to use and create the most amazing thing since mp3s.

Image generation has gotten a lot of attention because, well, it’s visual. But the AI evolution/revolution happening is also going to make your phone understand what you’re saying better than any human and help give assistance to hurricane victims.

Any rules on what can and can’t be used for training will have implications far beyond picture tools.

* it’s a bit more complicated as some images are trained at a higher resolution, a celebrity’s face or popular artist may be in thousands of images, etc.

Uh, anyway

So that’s what I’ve been playing with the last few months. Also doing stuff with GPT-3 and text generation in general (Kobold-AI is a good place to start there).

Like any powerful tool, AI can be used for good or evil, but I think it’s amazing that an art pleb like me can now make a nice apple.

It’s still early, improvements are happening at an amazing pace and it’s going to get easier to use and install on every kind of device – but a warning:

How to create Simon Belmont with DALL·E 2

Simon Belmont as he appears in Castlevania: Grimoire of Souls (Src: Wikipedia)

This morning OpenAI has changed the rules – we can share pictures with faces now! To celebrate, I figured I’d have DALL·E create a real life photo of Castlevania hero, Simon Belmont. He should look something like the above picture, right?

I’ll just enter the name and the style of photo I want and with the magic of AI we get…

“Simon Belmont , Professional Photograph in the studio, perfect lighting, bokeh”

…some bikers and Neo wannabes. DALL·E has been programmed to ignore (?) famous people and I guess that extends to fictional characters as well. Had poor results with Mickey Mouse and Shrek too.

It will never closely duplicate a celebrity face or anybody’s face for that matter, it will only output greatly “mixed” things. (this is a legal/ethical choice rather than a technological limitation I believe)

So the secret is to forget the name and craft a worthy sentence to describe the target in textual detail. Actually, I get slightly better results including the name so I’ll keep that too.

As a representative of lazy people everywhere, I’ll use OpenAI’s GPT-3 DaVinci to create the description for me. (Their text AI tools have no qualms referencing famous people or anything else)

Perfect. Now we feed the AI created description into DALL·E and get…

“Simon Belmont is a tall and muscular man with long, flowing blond hair. He has piercing blue eyes and a chiseled jawline. He typically wears a red tunic with a white undershirt, brown trousers, and black boots. He also wears a red cape and golden cross around his neck, Professional Photograph in the studio, perfect lighting, bokeh

Well, much closer. You know, we should have added a whip.

The quality stands up pretty well at full resolution too:

What a hero! We may have found the box art for Dink Smallwood 2… ! Or a romance novel. Oh, wait, we can’t use any of this generated stuff commercially yet, too bad.

Add an eye patch for Goro Majima Belmont

Conclusion

Being a skilled writer (unlike the person typing) will probably result in better images. All those pages of boring descriptive prose in The Hobbit would create masterpieces!

I’ve been dabbling with creating creature sprites/concept art to fit existing games (Like Dink Smallwood) but inpainting techniques have not been producing good results yet. Still learning and playing with things.

More DALL·E 2 fun – replacing things in existing photos

Inpainting tests

Inpainting is a technique where you start with a picture (in this case me) and ask the AI to replace a specific part of it. My first test was to try to add a second nose to myself, so I could smell things better. (All pictures/modifications created with OpenAI’s DALL·E 2 – all “original” pics used for inpainting are owned by me)

“Man with second nose growing out of forehead”

Um, not what I was going for.

Let’s stay away from the face and try to change Akiko’s dress. I use the brush to remove the area below her neck, slightly larger than the area of her dress.

“A beautiful elegant blue dress”

Incredible. It also filled in small parts of the person/stroller that is out of focus behind her as needed.

Retro actively changing clothes in any picture? Useful.

Another test. We’ll replace my son’s baseball glove with a pizza.

These results aren’t as convincing (I should have given it more arm to replace for a more natural pose I think), but I do appreciate how it was able to add missing detail to his hair after the glove was removed.

Well, you know what I have to do…

“Extremely muscled handsome man without a shirt”

Hmm. Not great. I guess changing my body into one with muscles is just one step too far for today’s technology. My guess is Dall-e actually doesn’t know much about bare skin (nipples and belly button seem to be missing?) due to using a censored training set to stop it from.. well, doing certain things. I’ll bet a suit works better, let’s try that.

“Three people wearing classy colorful suits.”

This time I did all three of us is one go. Not perfect – I wonder if it would work better if I did it in three separate operations, one for each person? Hrm. If you’re curious how I set which areas it could fill, it looked like this:

I left the holding hands in as a constraint for the image created.

Random Dall-e 2 test pictures

More random pics with the prompts used, some prompts created by friends.

“pyramid head on late night with conan o’brien”
Another try: “Still from silent hill movie, pyramid head on late night with conan o’brien, dramatic lighting”

“flaming headed monkey holding paperwork”
“hot dog being thrown down a hallway”
“winnie the pooh and Confucius in the delorean from back to the future”
“Winnie the pooh and Confucius in the DeLorean, a still from the movie back to the future, 4k”
“Winnie the pooh fixing the flux capacitor, still from the movie Back To the Future”

The incredible power of AI created media – illustrating an AI created story with DALL·E 2

The future is here

I’ve been interested in AI and computer generated media forever. As a kid, I created a stupidly naïve “AI” on my Commodore 64. I just programmed text responses to hundreds of inputs. “Hello” gave back “Hello, how are you.”. “I’m fine” gave back, “Great, me too.” and so on. I proudly showed my parents how smart my computer was.

A hollow illusion then, reality today. From Eliza to Deep Blue, Watson to Siri, between Deepfakes and GPT-3 (and friends), it’s all coming together to change our world at a blinding pace.

Deepfake test I did on my local computer a few years ago with DeepFaceLab

Do you know what two computer things I’ve been playing? Eldin Ring and OpenAI’s GPT-3 Playground. I’ve spent about $70 on each last month. Does spending cash on AI seem weird?

It’s no exaggeration to say playing around with it is one of the most engrossing and creative experiences I’ve had with computers to date.

OpenAI Inc/OpenAI LP are the big dogs in the field and carefully police usage. You can probably think of dozens of ways this technology (both the text creation and text to image tech of DALL-E) could be used for evil, spam, and misinformation.

Recently I’d been playing around with DALL·E mini and was in the process of setting up my own local server to try to allow higher quality work when I was granted access to the holy grail: DALL·E 2.

Let’s have the AI generate images for my old text-only game

In 1989 I started work on the BBS door-game Legend Of The Red Dragon. It’s a text game. What would happen if I took text from that game and asked AI to draw pictures for it from only the game text?


Let’s try its opening text as a prompt:

“You wake up early, strap your Short Sword to your back, and head out to the Town Square, seeking adventure, fame, and honor.”

Huh. Looks like it could use more direction. Let’s add “In the style of 3d rendered adventure game.”

“You wake up early, strap your Short Sword to your back, and head out to the Town Square, seeking adventure, fame, and honor. In the style of 3d rendered adventure game.”

Not bad. How about the Red Dragon Inn? Wonder how long these text prompts can be, let’s try the whole thing.

“You enter the inn and are immediately hailed by several of the patrons. You respond with a wave and scan the room. The room is filled with smoke from the torches that line the walls. Oaken tables and chairs are scattered across the room. You smile as the well-rounded Violet brushed by you…”

Well, the raw weird prose doesn’t seem to work that well. It isn’t given enough information to know it isn’t modern day. What if I change it around a little bit… (in theory you could use AI to rewrite the sentence to not be 1st person and add keywords to help with theme and historic era)

Note: I blocked out a woman’s face, I thought the rule was we can’t show them – but maybe we can, need to check the dalle-2 rules again.

“A painting of the medieval Red Dragon Inn. The room is filled with smoke from the torches that line the walls. Oaken tables and chairs are scattered across the room. Violet the barmaid smiles.”

Let’s try a different visual style.

“A photo of the medieval Red Dragon Inn. The room is filled with smoke from the torches that line the walls. Oaken tables and chairs are scattered across the room. Violet the barmaid is working.”

Hmm, it’s obvious that I could get better results if I took more care in the prompt text, but nifty anyway.

I could see it being fun to play old text games with AI generated images. I don’t see how to control Dall-e 2 with an API at the moment otherwise I might try modifying an infocom interpreter to automatically fetch them during play.

The 10-20 seconds to generate an image wouldn’t be fun to do it live, but how cool would it be to see “a bucket is here” and it appears/disappears from the image as you play the game?

The big problem is uniformity of style – but there are some tools dealing with this I haven’t played with yet. (starting with an uploaded photo, for example)

Let’s use AI for everything

How about using AI to help generate a brand new story, then illustrating it too?

Here is a test. The text with the white background I typed. The text with the green background was generated by AI. (Specifically, OpenAI.com’s text-davinci-002 engine)

Ok, we now have two characters. Now, we keep this text and continue with more prompts, interactively pulling out more details. We can always undo and try different prompts if needed.

Ok, now let’s send these descriptions to DALL·E 2 to create AI-based visual representations of the story the AI created. First let’s do Feival’s house:

“Feivel’s house is small but cozy. It is made of sticks and stones, with a thatched roof. There is a small fireplace in one corner, and a bed in another. A few shelves hold some of Feivel’s belongings, including his treasured map of the area.”

Not bad. I like the sixth image because I can see the treasure map on the chair.

Let’s do the Thimble description next. This time I’ll add “Realistic photo” at the end to specify the kind of image we want.

Thimble is an elderly mouse with gray fur. She is small and frail, but her eyes are bright and full of wisdom. She wears a simple dress and a scarf around her neck. She walks with a cane, but despite her age, she is still quite spry. Realistic photo.

Hmm. The cane didn’t seem to quite make it. This story seems like it might make a good children’s book. Let’s add “by Richard Scarry” to get that type of art style.

“Thimble is an elderly mouse with gray fur. She is small and frail, but her eyes are bright and full of wisdom. She wears a simple dress and a scarf around her neck. She walks with a cane, but despite her age, she is still quite spry. By Richard Scarry.”

Definitely got a children’s book style! The cane is now in every picture. I like this style.

I can ask for more variations in this style:

Writing a story with the characters the AI created

Hmm. Ok, we’ve got our stars, let’s have the AI write a story using them. I’m adding “Write an amusing children’s book about the above characters with a twist ending. Chapter 1:” to the end of the text we’ve already generated. (Again, green parts were created by the AI)

Well, uh, it’s a story. There are ways to coax out longer and more interesting things but this is fine for this test. Just for fun, let’s see if we can create artwork for the amazing battle scene of the giant mouse trap catching cats. I’m going to cheat and use my own descriptions for the prompt.

“Evil cats that wear clothes being cause in a giant mouse trail as a tiny clothed hero mouse strikes a victory pose in detailed colored pencil”

Uh, ok, obviously that prompt isn’t great as it looks like a cat is being hit with colored pencils. I’m showing you my failures, not just the good ones here! Let’s forget the mouse and just focus on the cats and the mouse trap.

“Evil cats being caught in a giant mousetrap, in surrealistic art style.”

These are pretty wild! Some of the faces are .. I don’t know, it may have actually tried to drawn them injured by a mousetrap, in retrospect this could trigger unintentionally gory results, especially if I used ‘photorealistic’ as a keyword.

Let’s move to safer ground and create an image for the happy (?) ending.

“An old clothed grandma mouse with a cane holding hands with a brave little boy mouse . Art by Richard Scarry”
The end!

Random fun with DALL·E 2

These are just various pictures created with DALL·E 2 and the text prompts used. It’s very interesting to see AI interpretations. Special thanks to Kevin Bates for brainstorming these prompts with me. It’s addicting, I can’t stop!

Note: The six images pic shows the prompt used, then I do some “closeups” of the more interesting ones. It’s really fast to do it this way, sorry it’s not nicer so each little pic is clickable.

“portable open source video game system”
Not real sure about this D-PAD design
Don’t steal these designs, Nintendo
“the droids from the movie starwars”

“R2D2 with arms and legs giving a high-five, zoomed out, photo”

“Ewok from the movie Return of the Jedi in a bikini”

“surrealistic photo of a puppy waring a VR helmet in a futuristic spaceship”

“the abstract concept of having free will”
“fisher price guillotine”
“golden gate bridge in the style of an oriental scroll”

In Summary..

Well, I’ve put way too many pictures in this post so I’ll end it here. The AI models I used are top of the line and have many usage restrictions, but it’s only a matter of time before similar things are available to everyone – Good or evil, unrestricted. I’m simultaneously excited and worried.

If you want to play around with generating images yourself, try DALL·E mini. Its output isn’t as impressive but it’s still fun and interesting to play with.

Adding a cool custom C++ class template to Visual Studio 2022

Ok, this is one of those posts that are more just to document something so the next time I install VS on a computer I remember how to do this and can find the file.

If you don’t use Visual Studio C++ then you should run. Or, prepare to dive into the incredibly boring world of the lengths programmers will go to just to avoid monotonous typing.

I’m all about classes. When you think of classy, think of me. Any time I do anything, I need a .cpp and.h file for it. And I want ’em setup pretty with the current date and stuff. Automagically!

See that? If I specify “GameProfileManager”, these two files get made with these contents. That’s all I’m trying to do, as I make a LOT OF CLASSES.

For years I used custom macros I’d made for Visual Assist, but at $279 + $119 yearly it’s a bit hard to continue to justify using VA when VS 2022 has most of its features built in these days.

Yes, you can pay less but rules like the payment “Cannot be reimbursed by a company or organization” and the world’s most annoying licensing system makes it not worth it.

So what’s a lad to do? Well, I finally got off my bum (well, on it, I guess) and learned how to have the same macros with built-in features that VS has had forever. They work a tiny bit different but close enough.

Hint: Don’t even try to use the “Snippets” feature to do this, it won’t work.

How to add

Download this file and place it in your %USERPROFILE%\Documents\Visual Studio 2022\Templates dir.

(or %USERPROFILE%\Documents\Visual Studio 2019\Templates if you’re using that)

DON’T UNZIP! VS reads it as a zip.

Close and restart Visual Studio.

Then add new classes using the “Add item” menu. Like this shittily produced unlisted video shows: (can you guess the music I used? Why did this even need music? Are you wondering what youtube video I was watching?)

How to change it

Great! Ok, but you probably wanted to customize it with your own name and not give me credit to everything you make. Well, fine, if you must.

To modify it with your own name/email/texts, just unzip the file you downloaded, you’ll see four files inside. Delete the original .zip. (you’ll make your own in a sec)

Edit the .cpp and .h as you like, then zip the four files up again. (I don’t think the name matters, it just has to be in the same directory as before) Restart VS and you should see the changes.

When editing, there are many special words you can use that get replaced automatically (full list here).

Some useful ones:

  • $safeitemname$ – (current filename without the .cpp or .h part)
  • $projectname$ – The name of the project this is in
  • $year$ – Current year (ie, 2022)
  • $time$ – The current time in the format DD/MM/YYYY 00:00:00
  • $username$ – The current user name
  • $registeredorganization$ – The registry key value from HKLM\Software\Microsoft\Windows NT\CurrentVersion\RegisteredOrganization

Hint: Edit the MyTemplate.vstemplate file to change the name and some other things. Useful if you want multiple templates, one for component classes, etc.

And that’s it! Suck it, Visual Assist. But also I love you so please get cheaper and less annoying to buy & license.