The pencil has no soul – How to prove chatbots aren’t sentient

Image created with Photoshop and Stable Diffusion

We’re now experiencing the biggest technological shift since the internet – AI text and image breakthroughs are changing the way we live and work.

It’s weird; what should be just an Eliza-style parlor trick turned out to be incredible useful, despite the technique’s penchant for ‘hallucinating’.

Example: I asked Bing AI to summarize my twitter feed and… well, we’re not there yet

Looking ahead: Some thoughts on what might become real problems soon

As I read news stories like Bing AI chatbot goes on ‘destructive’ rampage: ‘I want to be powerful – and alive’ and Introducing the AI Mirror Test, which very smart people keep failing I’m worried we’re on the threshold of an age where credulous people will fall in love with algorithmic personalities and will start to imbue souls into text prediction engines.

I know talking about this seems kind of silly now, but imagine the ability of chatbots in a few years.

Do you remember that one otaku who married his waifu? What if she also talked back and seems to know him better than anyone else in the world? Don’t underestimate how easy it is to pull at heart strings when the mechanical puller will have access to, oh, I don’t know, ALL HUMAN KNOWLEDGE and possibly your complete email contents. (I wonder if Google’s AI is already training itself on my gmail data…)

I don’t care what individuals do, let your freak flag fly high, but we get into trouble if/when people start trying to make stupid decisions for the rest of us.

Some (humans) might demand ethical treatment and human-like rights for AI and robots. There are already pockets of this, I’d guess (hope) the poll creator and the majority of people are just taking the piss, but who knows how many true believers are out there? If enough feel like this, could they actually influence law or create enough social pressure that impedes AI use and research?

Can we really blame less tech-savvy persons and children for falling for the fantasy? Um, maybe the previous sentence is a bit too elitist; I guess in a moment of weakness almost anybody could fall for this given how amazing these models will be.

Please don’t anthropomorphize AI

Here’s a secret about me:

  • I didn’t give a crap if Cortana ‘died’ in Halo. Not a bit.
The princess bride boo test: Switch Halo Cortana’s sexy audio/visual avatar to this and see if you’re still in love, John.
  • I didn’t care about the personal circumstances of the synthetics in Detroit: Become Human, not even the kid shaped ones designed to produce empathy.
claiming to feel pain or have dreams doesn’t make a program sentient, don’t be tricked

Regardless of output, a flea or even a bacteria cell is more alive than ChatGPT ever can be.

What’s wrong with me, is there a rock where my heart is?

Naw, I’m just logical. Many years ago I happened across John Searle’s Chinese Room thought experiment and from that point on, well, it seemed pretty obvious.

How this thought experiment applies to things like Bing AI

In the early days of computing when electronic computers didn’t exist yet (or were hard to access), paper was an important tool for developing and executing computer programs. Yeah, paper.

Now, here is the mind blowing thing – a pencil and paper (driven by a human instead of a cpu) is Turing complete. This means even these simple tools have the building blocks required to, well, compute anything, identical to how a computer would do it, simply by following instructions. (it’s not necessary for the human to understand what they’re doing, they just have to follow basic rules)

Image created with Photoshop and Stable Diffusion

Starting simple

Here is the source code for a text game called Hamurabi. If you understand C64 Basic, you could print this out “play” it by following the Basic commands and storing the variables all with a pencil.

However, with that same paper and pencil, you could also run this program at a deeper level and “play it” exactly like a real Commodore 64 computer would, emulating every chip (uh, let’s skip the SID, audio not needed), every instruction, the rom data, etc. For output, you could plot every pixel for every frame on sheets of paper so you could see the text.

Nobody would do all this because it’s hard and would run at 1FPM (frames per month) or something. BUT IT COULD BE DONE is the thing, and you would have identical results to a real C64, albeit it, somewhat slower.

In a thought experiment with unlimited time, pencils, and paper, it would be theoretically possible to run ANY DIGITAL COMPUTER PROGRAM EVER MADE.

Including any LLM/chatbot, Stable Diffusion, and Call of Duty.

Extrapolating from simple to complex

Yep, ChatGPT/Bing AI/whatever could be trained, modeled, and run purely on paper.

Like the Chinese room, you wouldn’t have to understand what’s happening at all, just follow an extremely long list of rules. There would be no computer involved.

It may take millions of years to scribble out but given the same training and input, the final output would be identical to the real ChatGPT.

Given this, there is no place where qualia or sentience can possibly come into play in a purely computational model, no matter how astonishing or realistic the output is or how mystifying a neural net seems to be.

I doubt anyone is going to claim their notebook has feelings or pencils feel pain, right? (if they do, they should probably think about moving out of Hogwarts)

Future AI is going to attempt to manipulate you for many reasons

To recap: There is 100% ZERO difference in ChatGPT being run by a CPU or simply written out in our thought experiment by hand, WITHOUT A COMPUTER. Even if you include things like video/audio, they all enter and exit the program as plain old zeroes and ones. Use a stick to write a waveform into the dirt if you want.

This argument has nothing to do with the ultimate potential of digital-based AI, it’s merely pointing out that no matter how real they (it?) seem, they are not alive and never can be alive. Even if they break your heart and manipulate you into crying better than the best book/movie in the universe, it is still not alive, our thought experiment can still reproduce those same incredible AI generated results.

If you incorporate anything beyond simple digital computing (like biological matter, brain cells or whatever) than all bets are off as that’s a completely different situation.

Rule #1 of AI: Do not hook up a text prediction engine to your smart home

The liars are coming

AI setup to lie to humans and pretend to be real (keeping the user in the dark) is a separate threat that can’t be discounted as well.

I mean, it’s not new or anything (singles in your area? Wow!) but they will be much better at conversation and leading you down a specific path via communication.

Being able to prove you aren’t a bot (for example, if you posted a gofundme) is going to become increasingly important. I guess we’ll develop various methods to do so, similar to how seller reviews on Amazon/Ebay help determine honesty in those domains.

So we can abuse AI and bots?

Now hold on. There are good reasons to not let your kid call Siri a bitch and kick your Roomba – but it has nothing to do with the well-being of a chip in your phone and everything to do with the emotional growth of your child and the society we want to live in.

Conclusion

I’m pro AI and I love the progress we’re making. I want a Star Trek-like future where everyone on the planet is allowed to flourish, I think it’s possible if we don’t screw it up.

We definitely can’t afford to let techno-superstition and robo-mysticism interfere.

It’s crucial that we ensure AI is not solely the domain of large corporations, but remains open to everyone. Support crowdsourcing free/open stuff!

Anyway, this kind of got long, sorry.

This blogpost was written by a human. (but can you really believe that? Stay skeptical!)

A blog post detailing my obsessive dive into generative AI

An image that says "Seth's AI tools" next to a bad-ass skeleton that was generated by stable diffusion

Over the last few months I created a fun little toy called Seth’s AI Tools (creative name, huh?), it’s an open source Unity program that has become a playground for me to test a mishmash of AI stuff.

If you click the “AI Paintball” button inside of it, you get the thing shown in the youtube video above.

This shitty game proof of concept generates every character image sprite immediately before it’s used on-screen based on the subject entered by the player. None of the art is included in the download. (well, a few things are, like the forest background and splat effects – although I did make them with this app too)

It’s 100% local and does not use any internet functionality. (behind the scenes, it’s using Stable Diffusion, GFPGAN, ESRGAN, CLIP interrogation, and DIS among other ML/AI stuff tech)

If I leave this running for twelve days, it will have generated and displayed over one million unique images during gameplay.

What can generative art bring to games?

Well, I figured this test would be interesting because having AI make unlimited unique but SIMILAR images of your opponent & teammates and popping them up randomly forces your brain to constantly make judgement calls.

You can never memorize the art patterns because everything is always new content. Sounds tiring now that I think about it.

If you don’t shoot an opponent fast enough, they will hit you. If you hit a friendly, you lose points.

Random thought: It might be interesting to render a second frame where I modify the first image and force a “smile” on it or something, but the whole thing looks like a bad flash game and I got kind of bored of working on it for now.

The challenge of trying to use dynamic AI art inside of a game

It’s neat to type in “purple corndog” and get a brand new picture in seconds. But as far as gamedev goes, what can you really do with a raw AI created image on the-fly?

Uhh… I guess you could…

  • Show pictures in a frame on a wall
  • Simple art for a “find the matching tiles” or a match three game
  • Background art, for gameplay or a title screen
  • Texture maps (can be tiled)

Your options are kind of limited.

To control the output better, one trick is to start with an existing image, and use a mask to only generate new data in certain parts. In this way, you have a lot more control, for example, you could only change someone’s shirt, and not touch their face.

I used this technique for my pizza screensaver test – I generated a pizza to use as a template once, then asked the AI to only fill in the middle of it (inpainting) without touching the outer crust. This is why every pizza has the same crust.

It works pretty well as I can hardcode the alpha mask to use so it’s a nice circle shaped sprite, don’t have to worry about shapes and edges at all. (see video below)

The “pizza” button in Seth’s AI tools. Every single pizza is unique and generated on the fly.

But with a newer technique called Dichotomous Image Segmentation that I hacked in a few days ago I can now create an alpha masked sprite dynamically in real-time. (A sprite being an object/creature image with a transparent background)

Using DIS works much better than other tests I did trying to use chroma or luma keying. It can pick up someone in a green shirt in front of a green background, for example.

It’s a generally useful thing to have around, even if it isn’t perfect. (and like with everything in this field, better data from more training will improve it)

This video shows a valid use: (I call it “removing background” in the video below, but it’s the same thing)

This shows how the “remove background” button works NOT in the game

Now moving on to the AI Paintball demo.

This isn’t a Rorschach ink blot test, it’s the starting shape I use to create all the characters in the AI Paintball test.

This image is the target of inpainting with a given text prompt, the background is removed (by creating an alpha mask of the subject) and voilà, there’s your chipmunk, skeleton, or whatever, ready to pop-up from behind a bush.

A note on the hardware I’m using to run this

I’m using three RTX 3090 GPUs, this is how I can generate an image per second or so. This means simply playing this game or using the pizza screen saver uses 1000+ watts of power on my system.

In other words, it’s the worst, most inefficient screen saver ever created and you should never use it as one.

If you only have one GPU the game/pizza demo will look much emptier as it will be slower to make images. (this could be worked around by re-using images but this kind of thing isn’t really for mass consumption anyway so I didn’t worry it)

Oh, want to run my AI Tools server + app on your own computer?

Well, it’s a bit convoluted so this is only for the dedicated AI lovers who have decent graphic cards.

My app requires that you also install a special server, this allows the two pieces to be updated separately and offload the documentation on installing the server to others. (it can be tricky…)

There are instructions here, or google “automatic1111 webui setup tutorial for windows” and replace where they mention https://github.com/AUTOMATIC1111/stable-diffusion-webui with https://github.com/SethRobinson/aitools_server instead.

The setup is basically the same as my customized server *is* that one, just with a few extra features added as well as insuring that it hasn’t broken compatibility with my tools.

The dangers of letting the player choose the game subject dynamically

The greatest strength and weakness of something like this is that the player enter their own description and can shoot at anything or anyone they want.

A shirtless Mario, something I created as an, uh, example of what you shouldn’t do. Unless that’s your thing, I mean, nobody is going to know.

Unfortunately, stable diffusion weight data reflects the biases and stereotypes of the internet in general because, well, that’s what it’s trained on. Turns out the web has become quite the cesspool.

Tim Berners-Lee would be rolling in his… oh, he’s still alive actually, really underscores how quick everything has changed.

The pitfalls are many: for example, if someone chooses the opponent “terrorist”, you can guess what ethnicity the AI is going to choose.

Entering the names of well known politicians and celebrities work too – there is no end of ways to create something offensive to someone with just a few keystrokes.

Despite being a silly little tech demo nobody will see I almost changed the name to “Cupid’s Arrows” where you shoot hearts or something in an effort to side-step the ‘violence against X’ issue but that seemed a bit too… I don’t know, condescending and obvious.

So I went with a paintball theme as a compromise, at least nobody is virtually dying now.

The legality of AI and the future

Well, this is my blog so I might as well put down some random thoughts about this too.

AI image generation is currently in the hot seat for being able to mimic popular artists’ style and create copyrighted or obscene material easier than ever before. (or for a good time, try both at once)

The stable diffusion data (called the weights) is around 4 GB, or 4,294,967,296 bytes. ALL images are created using only this data. It’s reportedly trained on 2.3 billion images from just around the internet.

Assuming that’s true, 4,294,967,296 bytes divided by 2.3 billion is only two bytes per image on average. *

Two bytes is enough space to store a single number between 0 and 65535) . How can all this be possible with only one number per image?! Well, it’s simple, it’s merely computing possibilities in noise space that are tied to tokens which are tied to words and … uh.. it’s all very mathy. Fine, I don’t really get it either.

This data (and code to use it) was released to the public for free and is responsible for much of the explosion we’re seeing now.

Our copyright system has never had to deal with concepts like “AI training”. How would it ever be feasible to get permission to use 2.3 billion images, and is it really necessary if it results in only a few bytes of data per each?

I’m hoping legally we end up with an opt-out system instead of requiring permission for all training because keep this mind: If you want to remove someone from a picture or upscale it, it will do the best job if it’s been trained on similar data. Using crippled data sets will make things less useful across the board.

To remove the birdy, the AI has to understand faces to fill in the missing parts.

Copyright as it applies to AI needs to evolve as fast as the technology, but that’s unlikely to happen. We have to find the balance in protecting IP but also not at the cost of hamstringing humanity’s ability to use and create the most amazing thing since mp3s.

Image generation has gotten a lot of attention because, well, it’s visual. But the AI evolution/revolution happening is also going to make your phone understand what you’re saying better than any human and help give assistance to hurricane victims.

Any rules on what can and can’t be used for training will have implications far beyond picture tools.

* it’s a bit more complicated as some images are trained at a higher resolution, a celebrity’s face or popular artist may be in thousands of images, etc.

Uh, anyway

So that’s what I’ve been playing with the last few months. Also doing stuff with GPT-3 and text generation in general (Kobold-AI is a good place to start there).

Like any powerful tool, AI can be used for good or evil, but I think it’s amazing that an art pleb like me can now make a nice apple.

It’s still early, improvements are happening at an amazing pace and it’s going to get easier to use and install on every kind of device – but a warning:

Comparing output of DALL-E Mini to DALL-E Mega

Yeah, mega is better of course. The end. But keep reading I guess.

So you’ve probably heard of OpenAI’s amazing DALL·E 2 text to picture generator. (I won’t shut about it!)

Note: None of the pictures in this article were created with DALL·E 2

The wackiest AI generated memes aren’t coming from DALL·E 2. They are mostly coming from a different (and computationally weaker) model called DALL·E mini created by Boris Dayma.

Confusing, right? Well, OpenAI thought so too and the end result is the front-facing website where you can generate pics from your web browser is now called Craiyon.

That said, their open source AI model is still technically called Dall-E Mini. For now.

But why tho

So why do people use Dall-E Mini and other open source AI projects even when they have access to DALL·E 2 & Davinci?

  • I can run it myself locally for free*
  • I can create an API to use it in an automated way easily, OpenAI doesn’t yet offer a public API for DALL·E 2 (I’m sure it’s coming though)
  • No strict censorship, it’s impossible to use DALL·E 2 for many uses because so many things aren’t allowed, including blood, guns, sexy terms, famous characters/people & religious symbols. But now I can finally use all of those at once! Uh… not saying I will, just… I mean, I just want to be able to. Uh, ethically and stuff. Yeah.
  • Can use commercially, unlike DALL·E 2 (for now)

* you know what I mean. The graphics card and energy to run it definitely aren’t free, but you aren’t reliant on someone else’s slow-ass webserver or extra fees.

Running my own version locally, no servers needed

So I dug in and got it running locally under Ubuntu with an NVidia 3090 and did some tests. Getting every library CUDA enabled and playing nice with each other was kind of a headache.

Note: Many of these tests don’t require a 3090, I don’t think anything I’ve run has used more than 50% of my video ram.

Special thanks to Sahar’s DALL-E Playground code and sorry for instantly defacing it to say SETH-E.

First some pics generated with DALL·E Mini:

It doesn’t understand “an orange” means one.
Um.. uhh… oh god.

Not great. But hold on – there is a more powerful data set Dayma has released called “Mega”.

So it’s still Dall-e Mini, but we’re using the “Mega-full” weight data which requires more memory and time to generate stuff. After enabling that, the pictures are better:

As you can see, it now understands “an orange” to just be one and the faces are far less horrific.

Note: I ran these prompts a few times, the results were consistent

As far as I can tell, Dayma and friends are still training models so looking forward to Super-Mega or Giga or whatever, even if my home computer will likely no longer be able to run them soon.

Switching to AI text generation for a second, I’ve also gotten GPT-J 6B running locally and am currently working towards getting EleutherAI/gpt-neox-20b running. (the closest open source thing to GPT-3 Davinci out there?)

To run gpt-neox-20b locally will require two 3090s in the same computer or buying a $30K graphics card. I’ll try the 3090s I guess.

So how’s the quality of Mega?

Well, nowhere near OpenAI’s offering but it’s definitely getting better.

Pics I made:

“Bloody teddy bear” – Behold, something impossible to create with DALL·E 2. (well, with using the word “bloody” anyway)

“teddy bear on a skateboard in times square”

“photo of cavalier dog in space suit”
“monkey riding a motorcycle black and white pencil drawing”

Seth, what are you even doing with all this AI stuff?

First, it’s just addicting to play with and fun trying to think of novel ways to use it.

Open source “State of the art” is changing regularly as new models and datasets are released from all over. It’s going to be a crazy decade, machine learning & AI is affecting EVERYTHING EVERYWHERE, it’s not just just for putting animals in spacesuits and generating erotic Willow (1988) fanfic.

I have some projects in mind but.. yeah, we’ll see.

Oh, for making it all the way to the end, have another orange test but this time I generated it using ruDALLE-E with the Malkovich dataset, yet another AI project to keep an eye on:

The 100 Prisoners Problem riddle interactive web app simulation I did in Unity

So this is one of those times where I made something in a few hours and want it to be indexed on the web rather than just the ethereal world of twitter so I’m making this post about it in the hopes that people will find it with a very specific Google search. (probably some kid stealing this for his homework.. steal away, I don’t mind!)

Image
The app looks like this. You can pan and zoom around and click buttons to control the simulation.

Play it here

Full source code of my unity project (github)

So if you’ve never heard of the 100 Prisoners Problem Riddle, it’s an amazing math trick where the solution seems to defy all logic. The way I was introduced to it was with Veritasium‘s easy to understand video on the subject:

Still here? Fine, go check out the Monty Hall Problem then!

How to create Simon Belmont with DALL·E 2

Simon Belmont as he appears in Castlevania: Grimoire of Souls (Src: Wikipedia)

This morning OpenAI has changed the rules – we can share pictures with faces now! To celebrate, I figured I’d have DALL·E create a real life photo of Castlevania hero, Simon Belmont. He should look something like the above picture, right?

I’ll just enter the name and the style of photo I want and with the magic of AI we get…

“Simon Belmont , Professional Photograph in the studio, perfect lighting, bokeh”

…some bikers and Neo wannabes. DALL·E has been programmed to ignore (?) famous people and I guess that extends to fictional characters as well. Had poor results with Mickey Mouse and Shrek too.

It will never closely duplicate a celebrity face or anybody’s face for that matter, it will only output greatly “mixed” things. (this is a legal/ethical choice rather than a technological limitation I believe)

So the secret is to forget the name and craft a worthy sentence to describe the target in textual detail. Actually, I get slightly better results including the name so I’ll keep that too.

As a representative of lazy people everywhere, I’ll use OpenAI’s GPT-3 DaVinci to create the description for me. (Their text AI tools have no qualms referencing famous people or anything else)

Perfect. Now we feed the AI created description into DALL·E and get…

“Simon Belmont is a tall and muscular man with long, flowing blond hair. He has piercing blue eyes and a chiseled jawline. He typically wears a red tunic with a white undershirt, brown trousers, and black boots. He also wears a red cape and golden cross around his neck, Professional Photograph in the studio, perfect lighting, bokeh

Well, much closer. You know, we should have added a whip.

The quality stands up pretty well at full resolution too:

What a hero! We may have found the box art for Dink Smallwood 2… ! Or a romance novel. Oh, wait, we can’t use any of this generated stuff commercially yet, too bad.

Add an eye patch for Goro Majima Belmont

Conclusion

Being a skilled writer (unlike the person typing) will probably result in better images. All those pages of boring descriptive prose in The Hobbit would create masterpieces!

I’ve been dabbling with creating creature sprites/concept art to fit existing games (Like Dink Smallwood) but inpainting techniques have not been producing good results yet. Still learning and playing with things.