[ Beneath the Waves ]

On AI Art Generation

article by Ben Lincoln


Over the years, I've seen at least two career paths evaporate in part or entirely because someone developed a technology that disrupted those industries, so I definitely sympathize with artists who are worried about their future given the rapid advances AI-generated imagery is making.

That having been said, disruptive technologies are magic boxes full of all kinds of possibilities (both good and bad), but the one thing that's essentially never inside the box is a way to put everything back inside and pretend the box never existed. Hoping that people will stop using AI image generation (or insisting that they do so) is as futile as Metallica trying to tell people to keep buying CDs and stop listening to mp3s in the late 1990s.[1]

Reflecting on a magic box that has just been opened
[ The Magic Box ]
The Magic Box

Generated using Midjourney.


In terms of technology likely to be developed during any of our lifetimes, my advice is to think about what it is about your work that inherently requires a human mind, and focus on being good at those things. If someone invents a tool that makes the other parts of your work easier or faster, look at how that gives you more time to do the creative work. The CEO of the company I work for calls this the "Iron Man suit" model[2], where humans do the things humans are best at, including directing automated systems to do the repetitive/procedural tasks that don't require a conscious mind, or are at least beyond the reach of current automated systems.

If we ever develop artificial consciousness that can do truly creative work, make non-obvious connections/inferences/deductions (especially across different fields), and so on (basically, do things that haven't been done before), we'll have to figure out some non-capitalistic approach to society because none of us will be able to find paying jobs anymore, but I think that's many decades or maybe even centuries away.

For me, the value of human artists is almost entirely their minds, not whether they can paint something I describe in the style of Greg Rutkowski. It's the surprises, details, and (for commissions) the collaborative aspects. Artists are lucky, because their field will always benefit from the human element. There are massive swathes of industry that aren't like that, and the only reason those jobs haven't been completely automated away is that no one has figured out how to make machines that's cheap enough. That could change at any time for any of those other jobs.

How I View AI-Generated Visual Art

The value of AI art for me is the ability to quickly generate things that are basically mashups of concepts to see what works, or that are "good enough" to use for (hopefully) funny memes. The "Jodorowski's Tron" Midjourney example is (in my opinion) a fun glimpse into a parallel timeline. Probably no one will ever spend the money to make an actual film that looks like that, but we get to see something kind of like a recording of someone else's dream about it. There are many similar examples, and probably a lot of them will do things like convince film studios to let a human production team make a coherent movie that wouldn't exist if someone hadn't been able to show the studio execs the the AI-generated concepts. i.e. no different than a writer stitching together a private "pitch video" from a bunch of pieces of other peoples' media.

The main reason I approach AI art generation that way is because of how it works. It's essentially doing this:

  1. Considering the entire "probability space" of images in the training set related to the prompt or other constraints, what general patterns of colour appear on the canvas? i.e. if I ask for an image of a scarecrow in a field, probably the upper 1/3, 1/2, or 2/3 is blue, the bottom is green or yellow, and there's a brownish blob centered horizontally at about 1/3, 1/2, or 2/3 of the image width. Pick one of those variations at random.
  2. Given those initial blurs of colour, and still taking the prompt/constraints into account, what slightly smaller blobs of colour are likely to appear in that type of image? i.e. for a scarecrow standing in a field, the brownish blur in the middle probably has a yellow or white blur at the top for the head, a different coloured blur near the middle for a shirt, and maybe a thin black vertical blur near the left or right side if it's holding a rake/shovel/etc. Select those blobs randomly and add them.
  3. Keep repeating this process over and over until the image is sufficiently detailed.

There's a nice visualization of this iterative process in this TikTok video by @nokeeg. Here's an example using some of the early Midjourney renditions that led to the image near the top of this article:

[ An animation showing the steps in rendering the magic box image ]
Image iterations
[ 12% ]
[ 25% ]
[ 37% ]
[ 50% ]
[ 68% ]
[ 81% ]
[ 100% ]

Generated using Midjourney.


The software is basically "riffing" on what that kind of image (or element within an image) usually looks like. While the results can be surprising if they combine prompts/constraints that one hasn't seen before, the composition and details won't be surprising unless the person viewing them has been isolated from popular culture for decades.

This is why AI art generators can also be used to generate higher-resolution versions of small images - they "hallucinate" the missing information based on what commonly appears in the source image set for similar images. It doesn't reveal any detail that wasn't actually there in the original image, but it looks more or less the way you'd expect because it's based on thousands of other images of similar things.

Conversely, one of the ways a human artist can differentiate themselves from AI-generated art is to try new things, and go in unexpected directions. You want to make a picture of a futuristic city like no one has seen before? Don't just look at other peoples' pictures of futuristic cities. Look at the architecture of ancient civilizations and think about what it would look like if the same senses of geometry and layout were extrapolated thousands of years into the future. Making a picture of an alien spaceship interior? Everyone has already seen designs copied from H.R. Giger paintings. Look at the kinds of homes that animals make for themselves, and then imagine those shapes and layout made using advanced materials. Compose your images based on a grid of 5x5, 7x7, or some other prime number instead of 2x2 or 3x3. Work with physical media instead of (or in addition to) digital tools.

Learn to use new technologies for yourself. Most popular coverage of AI art generation is focused on how non-artists use it, but the same technology can be used more like a semi-intelligent paintbrush to quickly try out concepts before deciding to commit to a particular direction and details, like a film director using low-quality previsualization to figure out how to film the real cast later. Here's a basic example of using Stable Diffusion to sketch new elements into existing images (skip to the 3:10 and 4:06 marks if you like).

There are certainly some types of commercial art that won't be as lucrative now that "looks more or less the way I'd expect" can be automated. Nearly every science fiction novel set in space has cover art consisting of a random planet with a random spaceship sitting in front of it. Typically, neither the planet nor the spaceship will be related to the story in the novel (excepting, of course, novels based on a particular film or television series). Some of that cover art is pretty neat, but usually it's disappointingly similar to thousands of other cover art paintings. It's probably not going to be possible to make a living entirely by hand painting "generic spaceship in front of a generic planet" cover art anymore.

Similarly, I think it's very likely that the market for stock photos (both taking the pictures and licensing them to other people) is going to become much smaller than it is today. Stock photos of news-related events will still be valuable, but I'd be surprised if businesses were still licensing things like "generic photo of happy office workers sitting in an office" to put behind their logo in a banner on their website in ten years.

AI: Beyond Visual Art

The potential impact on visual artists seems to be the area of greatest interest (or at least the most spirited debates) right now, but in my opinion it's just the tip of the iceberg. Expect to see a lot of vaguely weird AI-generated prints on things like fabric, decorative paper of all kinds, and hotel room artwork, for example. More importantly, software that can mass-produce convincing images is now in the wild. The most easily accessible software has generally been constrained in hopes of making it harder to use for controversial or malicious purposes, but that is just going to delay the inevitable slightly. Get ready for society to splinter even further as people go down increasingly bizarre conspiracy theory rabbit holes based on visual "evidence" that came out of an AI image generator. Prepare to see an awful lot of disturbing imagery as internet trolls build automated offensive image generation systems that overwhelm content moderation due to the volume and uniqueness of every result.

Musicians are probably going to see similar advances in AI sooner rather than later. It's been in the works for awhile. [ Edit: two days after I originally posted this, Riffusion, a tool that uses Stable Diffusion to generate music based on spectrograms went live. ] If you want to differentiate yourself, look into things like microtonal scales and unusual sound generation techniques.

There's a lot of hype at the moment about AI-based text output, but I think it's misdirected. Like visual art generation, the output of AI text generators (ChatGPT, etc.) looks superficially convincing, and sometimes even the details happen to be accurate. However, it's not something one should trust to describe actual facts about the real world, because there is no mind there. It's essentially a much fancier version of the Eliza software from decades ago. It's just as likely to generate factually incorrect output as output that's correct. That's fine if one is asking it to write a love story about a turtle and a frog, but if one is asking it to write a computer program or a legal contract, one still needs an actual human expert to go over the whole thing line by line to verify it. Maybe that's more efficient for some people, but it's faster for me to write things from scratch in my fields of expertise than to microscopically analyze existing text when I have to assume it may be 100% incorrect.

I think the main ways we're going to see AI text used in the next 5-10 years will be:

I assume that many of the most interesting and/or dangerous uses for AI (whether it's text, visual art, music, or anything else) won't even become apparent for awhile. The basic technology is incredibly powerful, and I'm reminded of one of my favourite quotes from Alastair Reynolds' Absolution Gap:

"...the hypometric weapon represented a general class of weakly acausal technologies usually developed by pre-Inhibitor-phase Galactic cultures within the second or third million years of their starfaring history. There were layers of technology beyond this, Aura's information had implied, but they could certainly not be assembled using human tools. The weapons in that theoretical arsenal bore the same abstract relationship to the hypometric device as a sophisticated computer virus did to a stone axe. Simply grasping how such weapons were in some way disadvantageous to something loosely analogous to an enemy would have required such a comprehensive remapping of the human mind that it would be pointless calling it human any more." [ emphasis mine ]

i.e., less colourfully, disruptive technologies often have impacts that are difficult to fully comprehend (let alone plan for) without already having lived in a world where those technologies are commonplace. Imagine trying to explain a wireless voltage detector to someone in ancient Rome. "It's kind of like a lodestone, but it tells you if there's a miniature lightning bolt hiding inside a wire or not. How frequently do I have to worry about wires with lightning bolts trapped inside them? Well, one of the funny things about the future is that almost all of us have trapped-lightning-bolt wires running throughout our homes. Haha, yes, that does seem like a funny thing to willingly have inside a house, but it turns out one can use trapped lightning bolts to do all sorts of useful things. They're very dangerous, though - and it's easy to lose track of which wires are connected to each other - so when we're working on the wires, we need a way to tell if the lightning bolt is still inside or not."

What will the equivalents of wireless voltage detectors and AFI/GFCI circuit breakers be for AI content generation technology? Ask me again in ten years once society has had time to absorb the change.

Epilogue: But What About AI Training Data?

A lot of the intense negative feeling in discussions about AI art is due to the impression that being able to approximate the style of specific artists is somehow stealing the work of those artists. I understand why people might feel that way, and I suspect it's such a strong feeling that no argument will change some (probably most) of their minds, but I think it's a mischaracterization. Like almost every field, artists are always standing on the shoulders of giants, regardless of the tools they're using. Every artist is inspired by other artists, and a huge number of them mimic the style of other artists at least as closely as AI image generators.

Is there an outcry every time another person copies the characteristic manga/anime style derived from Osamu Tezuka's work? What about every time another artist borrows H.R. Giger's extremely distinctive style of biomechanical artwork without paying royalties or crediting his estate? How many films and videogames feature creepy alien technology that's basically copy/pasted from a Giger painting? How many spooky independent artists have made a career selling art that could have been produced by Giger?

I think this is how art (whether it's visual or otherwise) works. Technology just made it efficient enough that anyone can do it to some degree. If it's OK for Nickelback to basically sell slight variations on the statistical average of all popular rock music released in the previous five years before their album, it's OK for someone to create art that looks like something another artist might have made in a parallel universe, as long as it's not actually an unauthorized copy of something that other artist really did make in this one. Copyright law already covers this.

1. I'm one of the few music fans who still (in 2022) buys albums (on CD where possible, digitally if not) by all of the artists they like, and even I recognize that's essentially a niche industry now.
2. I think this is based on the "Automation Should Be Like Iron Man, Not Ultron" article by Thomas A. Limoncelli.
[ Page Icon ]