{"id":2987,"date":"2022-11-06T08:21:24","date_gmt":"2022-11-05T23:21:24","guid":{"rendered":"https:\/\/www.codedojo.com\/?p=2987"},"modified":"2023-02-14T08:12:42","modified_gmt":"2023-02-13T23:12:42","slug":"a-blog-post-detailing-my-obsessive-dive-into-generative-ai","status":"publish","type":"post","link":"https:\/\/www.codedojo.com\/?p=2987","title":{"rendered":"A blog post detailing my obsessive dive into generative AI"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"506\" height=\"285\" src=\"https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/blog_thumb_506_285.jpg\" alt=\"An image that says &quot;Seth's AI tools&quot; next to a bad-ass skeleton that was generated by stable diffusion\" class=\"wp-image-3008\" srcset=\"https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/blog_thumb_506_285.jpg 506w, https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/blog_thumb_506_285-300x169.jpg 300w\" sizes=\"auto, (max-width: 506px) 100vw, 506px\" \/><\/figure>\n\n\n\n<p>Over the last few months I created a fun little toy called <a href=\"https:\/\/github.com\/SethRobinson\/aitools_server\">Seth&#8217;s AI Tools<\/a> (creative name, huh?), it&#8217;s an <a href=\"https:\/\/github.com\/SethRobinson\/aitools_client\">open source Unity program<\/a> that has become a playground for me to test a mishmash of AI stuff.  <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"AI Paintball - An experiment with real-time AI generated graphics using stable diffusion\" width=\"625\" height=\"352\" src=\"https:\/\/www.youtube.com\/embed\/FoYY_90KlyE?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>If you click the <strong>&#8220;AI Paintball<\/strong>&#8221; button inside of it, you get the thing shown in the youtube video above.<\/p>\n\n\n\n<p>This <s>shitty game<\/s> proof of concept generates every character image sprite <strong>immediately before<\/strong> it&#8217;s used on-screen based on the subject entered by the player.  None of the art is included in the download.  (well, a few things are, like the forest background and splat effects &#8211; although I did make them with this app too)<\/p>\n\n\n\n<p>It&#8217;s 100% local and does not use any internet functionality. (behind the scenes, it&#8217;s using <a href=\"https:\/\/github.com\/CompVis\/stable-diffusion\">Stable Diffusion<\/a>, <a href=\"https:\/\/github.com\/TencentARC\/GFPGAN.git\">GFPGAN<\/a>, <a href=\"https:\/\/github.com\/xinntao\/ESRGAN\">ESRGAN<\/a>, <a href=\"https:\/\/openai.com\/blog\/clip\/\">CLIP interrogation<\/a>, and <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/xuebinqin\/DIS\" target=\"_blank\">DIS<\/a> among other ML\/AI stuff tech)<\/p>\n\n\n\n<p>If I leave this running for twelve days, it will have generated and displayed over <strong>one million unique images<\/strong> during gameplay.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What can generative art bring to games?<\/h2>\n\n\n\n<p>Well, I figured this test would be interesting because having AI make <em><strong>unlimited unique but SIMILAR images<\/strong><\/em> of your opponent &amp; teammates and popping them up randomly forces your brain to constantly make judgement calls.<\/p>\n\n\n\n<p>You can never memorize the art patterns because everything is always new content.  Sounds tiring now that I think about it.<\/p>\n\n\n\n<p>If you don&#8217;t shoot an opponent fast enough, they will hit you.  If you hit a friendly, you lose points.<\/p>\n\n\n\n<p><strong>Random thought:<\/strong> It might be interesting to render a second frame where I modify the first image and force a &#8220;smile&#8221; on it or something, but the whole thing looks like a bad flash game and I got kind of bored of working on it for now.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The challenge of trying to use dynamic AI art inside of a game<\/h2>\n\n\n\n<p>It&#8217;s neat to type in <strong>&#8220;purple corndog&#8221;<\/strong> and get a brand new picture in seconds.  But as far as gamedev goes, what can you really do with a raw AI created image on the-fly?<\/p>\n\n\n\n<p>Uhh&#8230; I guess you could&#8230;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Show pictures in a frame on a wall<\/li>\n\n\n\n<li>Simple art for a &#8220;find the matching tiles&#8221; or a match three game<\/li>\n\n\n\n<li>Background art, for gameplay or a title screen<\/li>\n\n\n\n<li>Texture maps (can be tiled)<\/li>\n<\/ul>\n\n\n\n<p><em>Your options are kind of limited.<\/em><\/p>\n\n\n\n<p>To control the output better, one trick is to start with an existing image, and use a mask to only generate new data in certain parts.  In this way, you have a lot more control, for example, you could only change someone&#8217;s shirt, and not touch their face.<\/p>\n\n\n\n<p>I used this technique for my <strong>pizza screensaver test<\/strong> &#8211; I generated a pizza to use as a template once, then asked the AI to only fill in the middle of it (inpainting) without touching the outer crust.  This is why every pizza has the same crust.<\/p>\n\n\n\n<p>It works pretty well as I can hardcode the alpha mask to use so it&#8217;s a nice circle shaped sprite, don&#8217;t have to worry about shapes and edges at all.  (see video below)<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Ultimate pizza screensaver\" width=\"625\" height=\"352\" src=\"https:\/\/www.youtube.com\/embed\/udGIWo_vMoc?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">The &#8220;pizza&#8221; button in Seth&#8217;s AI tools.  Every single pizza is unique and generated on the fly.<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>But with a newer technique called <a href=\"https:\/\/github.com\/xuebinqin\/DIS\">Dichotomous Image Segmentation<\/a> that I hacked in a few days ago I can now create an alpha masked sprite dynamically in real-time. (A sprite being an object\/creature image with a transparent background)  <\/p>\n\n\n\n<p>Using DIS works much better than other tests I did trying to use chroma or luma keying.  It can pick up someone in a green shirt in front of a green background, for example.<\/p>\n\n\n\n<p>It&#8217;s a generally useful thing to have around, even if it isn&#8217;t perfect. (<em>and like with everything in this field, better data from more training will improve it<\/em>)<\/p>\n\n\n\n<p>This video shows a valid use:  (I call it &#8220;removing background&#8221; in the video below, but it&#8217;s the same thing)<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Seth&#039;s AI Tools new feature - Remove Background\" width=\"625\" height=\"352\" src=\"https:\/\/www.youtube.com\/embed\/3PmZ_9QfrE0?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">This shows how the &#8220;remove background&#8221; button works NOT in the game<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"512\" src=\"https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/template_dark_gray_pillar_background.bmp\" alt=\"\" class=\"wp-image-2992 size-full\"\/><\/figure><div class=\"wp-block-media-text__content\">\n<p>Now moving on to the AI Paintball demo.<\/p>\n\n\n\n<p>This isn&#8217;t a Rorschach ink blot test, it&#8217;s the starting shape I use to create all the characters in the AI Paintball test.<\/p>\n\n\n\n<p><\/p>\n<\/div><\/div>\n\n\n\n<p>This image is the target of inpainting with a given text prompt, the background is removed (by creating an alpha mask of the subject) and voil\u00e0, there&#8217;s your chipmunk, skeleton, or whatever, ready to pop-up from behind a bush.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A note on the hardware I&#8217;m using to run this<\/h2>\n\n\n\n<p>I&#8217;m using three <strong>RTX 3090 GPUs,<\/strong> this is how I can generate an image per second or so.  This means simply playing this game or using the pizza screen saver uses 1000+ watts of power on my system.<\/p>\n\n\n\n<p>In other words, <strong>it&#8217;s the worst, most inefficient screen saver ever created<\/strong> and you should never use it as one.<\/p>\n\n\n\n<p>If you only have one GPU the game\/pizza demo will look much emptier as it will be slower to make images.  (this could be worked around by re-using images but this kind of thing isn&#8217;t really for mass consumption anyway so I didn&#8217;t worry it)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Oh, want to run my AI Tools server + app on your own computer? <\/h2>\n\n\n\n<p>Well, it&#8217;s a bit convoluted so this is only for the dedicated AI lovers who have decent graphic cards. <\/p>\n\n\n\n<p>My app requires that you also install a special server, this allows the two pieces to be updated separately and offload the documentation on installing the server to others. (it can be tricky&#8230;)<\/p>\n\n\n\n<p>There are <a href=\"https:\/\/github.com\/SethRobinson\/aitools_server#installation-and-running-modified-from-stable-diffusion-webui-docs\">instructions here<\/a>, or google &#8220;automatic1111 webui setup tutorial for windows&#8221; and replace where they mention <strong>https:\/\/github.com\/AUTOMATIC1111\/stable-diffusion-webui <\/strong>with <strong>https:\/\/github.com\/SethRobinson\/aitools_server<\/strong> instead.  <\/p>\n\n\n\n<p>The setup is basically the same as my customized server <strong>*is*<\/strong> that one, just with a few extra features added as well as insuring that it hasn&#8217;t broken compatibility with my tools.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The dangers of letting the player choose the game subject dynamically<\/h2>\n\n\n\n<p>The greatest strength and weakness of something like this is that the player enter their own description and <strong>can shoot at anything or anyone they want<\/strong>.   <\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"944\" height=\"830\" src=\"https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/image-1.png\" alt=\"\" class=\"wp-image-3007 size-full\" srcset=\"https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/image-1.png 944w, https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/image-1-300x264.png 300w, https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/image-1-768x675.png 768w, https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/image-1-624x549.png 624w\" sizes=\"auto, (max-width: 944px) 100vw, 944px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p>A shirtless Mario, something I created as an, uh, example of what you shouldn&#8217;t do. Unless that&#8217;s your thing, I mean, nobody is going to know.<\/p>\n<\/div><\/div>\n\n\n\n<p>Unfortunately, stable diffusion weight data reflects the <strong>biases and stereotypes<\/strong> of the internet in general because, well, that&#8217;s what it&#8217;s trained on. Turns out the web has become quite the cesspool.  <\/p>\n\n\n\n<p>Tim Berners-Lee would be rolling in his&#8230; oh, he&#8217;s still alive actually, really underscores how quick everything has changed.<\/p>\n\n\n\n<p>The pitfalls are many: for example, if someone chooses the opponent &#8220;terrorist&#8221;, you can guess what ethnicity the AI is going to choose.<\/p>\n\n\n\n<p>Entering the names of well known <strong>politicians<\/strong> and <strong>celebrities<\/strong> work too &#8211; there is no end of ways to create something offensive to someone with just a few keystrokes.<\/p>\n\n\n\n<p>Despite being a silly little tech demo nobody will see I almost changed the name to <strong>&#8220;Cupid&#8217;s Arrows&#8221;<\/strong> where you shoot hearts or something in an effort to side-step the &#8216;violence against X&#8217;  issue but that seemed a bit too&#8230; I don&#8217;t know, condescending and obvious.  <\/p>\n\n\n\n<p>So I went with a paintball theme as a compromise, at least nobody is virtually dying now.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The legality of AI and the future<\/h2>\n\n\n\n<p>Well, this is my blog so I might as well put down some random thoughts about this too.<\/p>\n\n\n\n<p>AI image generation is currently <strong>in the hot seat<\/strong> for being able to mimic popular artists&#8217; style and create copyrighted or obscene material easier than ever before.  (or for a good time, try both at once)<\/p>\n\n\n\n<p>The stable diffusion data (called the weights) is around 4 GB, or 4,294,967,296 bytes.  ALL images are created using only this data.  It&#8217;s reportedly trained on <strong><a href=\"https:\/\/waxy.org\/2022\/08\/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator\/\">2.3 billion images<\/a><\/strong> from just around the internet. <\/p>\n\n\n\n<p>Assuming that&#8217;s true, <strong>4,294,967,296 bytes divided by 2.3 billion is only two bytes per image<\/strong> on average. * <\/p>\n\n\n\n<p>Two bytes is enough space to store a single number between 0 and 65535) . How can all this be possible with only one number per image?!  Well, it&#8217;s simple, it&#8217;s merely computing possibilities in noise space that are tied to tokens which are tied to words and &#8230; uh.. it&#8217;s all very mathy.  Fine, I don&#8217;t really get it either.<\/p>\n\n\n\n<p>This data (and code to use it) was released to the public for free and is responsible for much of the explosion we&#8217;re seeing now.<\/p>\n\n\n\n<p>Our copyright system has never had to deal with concepts like &#8220;AI training&#8221;.  How would it ever be feasible to get permission to use 2.3 billion images, and is it really necessary if it results in only a few bytes of data per each?<\/p>\n\n\n\n<p>I&#8217;m hoping legally we end up with an opt-out system instead of requiring permission for all training because keep this mind:  <strong>If you want to remove someone from a picture or upscale it, it will do the best job if it&#8217;s been trained on similar data.<\/strong>  Using crippled data sets will make things less useful across the board.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"322\" src=\"https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/ai_tools_birdy_to_bird-1024x322.jpg\" alt=\"\" class=\"wp-image-3003\" srcset=\"https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/ai_tools_birdy_to_bird-1024x322.jpg 1024w, https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/ai_tools_birdy_to_bird-300x94.jpg 300w, https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/ai_tools_birdy_to_bird-768x241.jpg 768w, https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/ai_tools_birdy_to_bird-1536x483.jpg 1536w, https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/ai_tools_birdy_to_bird-2048x644.jpg 2048w, https:\/\/www.codedojo.com\/wp-content\/uploads\/2022\/11\/ai_tools_birdy_to_bird-624x196.jpg 624w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">To remove the birdy, the AI has to understand faces to fill in the missing parts. <\/figcaption><\/figure>\n\n\n\n<p>Copyright as it applies to AI needs to evolve as fast as the technology, but that&#8217;s unlikely to happen.  We have to find the balance in protecting IP but also not at the cost of hamstringing humanity&#8217;s ability to use and create the most amazing thing since mp3s.<\/p>\n\n\n\n<p>Image generation has gotten a lot of attention because, well, it&#8217;s visual. But the AI evolution\/revolution happening is also going to make your phone <a href=\"https:\/\/openai.com\/blog\/whisper\/\">understand what you&#8217;re saying<\/a> better than any human and <a href=\"https:\/\/odsc.medium.com\/google-and-givedirectly-use-ai-to-provide-cash-assistance-to-victims-of-hurricane-ian-956c020fc4b4\">help give assistance to hurricane victims<\/a>.  <\/p>\n\n\n\n<p>Any rules on what can and can&#8217;t be used for training will have implications far beyond picture tools.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>* <em>it&#8217;s a bit more complicated as some images are trained at a higher resolution, a celebrity&#8217;s face or popular artist may be in thousands of images, etc.<\/em><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Uh, anyway<\/h2>\n\n\n\n<p>So that&#8217;s what I&#8217;ve been playing with the last few months. Also doing stuff with GPT-3 and text generation in general (<a href=\"https:\/\/github.com\/KoboldAI\/KoboldAI-Client\">Kobold-AI<\/a> is a good place to start there).<\/p>\n\n\n\n<p>Like any powerful tool, AI can be used for good or evil, but I think it&#8217;s amazing that an art pleb like me can now make a nice apple.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Seth&#039;s AI Tools - Growing an Apple\" width=\"625\" height=\"352\" src=\"https:\/\/www.youtube.com\/embed\/2TB4f8ojKYo?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>It&#8217;s still early,  improvements are happening at an amazing pace and it&#8217;s going to get easier to use and install on every kind of device &#8211; but a warning:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">So now that <a href=\"https:\/\/twitter.com\/hashtag\/StableDiffusion?src=hash&amp;ref_src=twsrc%5Etfw\">#StableDiffusion<\/a> and its data has been fully open sourced (usable without any filters), I&#39;d like to warn everyone to never believe anything based solely on a photo ever again.  Also, unrelated, but uh, wasn&#39;t Carrie Fisher good in Star Trek?! <a href=\"https:\/\/t.co\/yJ7Qtlixy5\">pic.twitter.com\/yJ7Qtlixy5<\/a><\/p>&mdash; Seth A. Robinson (@rtsoft) <a href=\"https:\/\/twitter.com\/rtsoft\/status\/1561974575847833600?ref_src=twsrc%5Etfw\">August 23, 2022<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Over the last few months I created a fun little toy called Seth&#8217;s AI Tools (creative name, huh?), it&#8217;s an open source Unity program that has become a playground for me to test a mishmash of AI stuff. If you click the &#8220;AI Paintball&#8221; button inside of it, you get the thing shown in the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[35,11,3,6,21],"tags":[],"class_list":["post-2987","post","type-post","status-publish","format-standard","hentry","category-ai","category-art","category-development","category-tech-tips","category-unity"],"_links":{"self":[{"href":"https:\/\/www.codedojo.com\/index.php?rest_route=\/wp\/v2\/posts\/2987","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codedojo.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codedojo.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codedojo.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codedojo.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2987"}],"version-history":[{"count":20,"href":"https:\/\/www.codedojo.com\/index.php?rest_route=\/wp\/v2\/posts\/2987\/revisions"}],"predecessor-version":[{"id":3032,"href":"https:\/\/www.codedojo.com\/index.php?rest_route=\/wp\/v2\/posts\/2987\/revisions\/3032"}],"wp:attachment":[{"href":"https:\/\/www.codedojo.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2987"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codedojo.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2987"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codedojo.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2987"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}