Count the fingers, count the teeth: Why AI isn't taking over the Arts any time soon.

23.11.23

Dr. Sabine Weber is a computer scientist and organiser at Queer in AI. They are interested in multilingual systems, AI ethics, science communication and art. In their free time they do stand-up comedy and cook elaborate meals. You can find their writing here and here.

This is the third post in a series on AI in the arts. Check out the first and second and fourth pieces on our blog.

AI is scary. Despite having studied computer science and worked on AI for nearly a decade, I am still fundamentally uncomfortable with having a “conversation” with ChatGPT. Part of it is due to the so-called uncanny valley - the idea that if an object becomes too similar to a human we stop finding it cute and start finding it creepy (think for example porcelain dolls, clown masks or the 2019 movie version of “Cats”).

But for many the fear goes deeper than just surface level discomfort and straight down to existential dread. If an AI can draw a picture from a prompt, will I still get commissioned for artwork? If an AI writes personalised bedtime stories, will people buy the anthology I wrote? What will I get paid for, if an AI writes ads for free?

There are many ways to answer these questions, from an economic, sociological or psychological perspective. My answer comes from computer science. A simple truth about artificial intelligence is that it is not all that intelligent. In fact, using this term is a great marketing strategy on the side of those who sell these systems. The internet is full of examples where users provoked ChatGPT to make nonsensical statements, and it doesn't take much effort to make the chat bot falsely claim that my hometown holds an annual sausage race and has a TV tower called “tall Jakob”. AI generated photographs regularly get debunked because the people in them have too many fingers, arms that attach at weird angles or eyes that look in the wrong direction. Why is it that systems that spew out business strategies and travel itineraries make these fundamental mistakes?

AIs are different from conventional computer programs. Rather than programmers writing down rules about how a program should behave depending on different inputs, an AI “learns” the rules from examples (this is why a more accurate term for AI is Machine Learning). The phase of “learning” is called training and the examples it sees are called training data. After the training, a system can reproduce the rules it observed in the training data.

In the case of ChatGPT the training data is huge amounts of text from the internet.

If you ask a question to ChatGPT, the answer will be an agglomerated average of all the answers to your question that were in the training data. If you look at systems that generate images from text prompts, the training data is images and their descriptions. An AI-generated picture is more or less a combined average of all pictures with similar descriptions to your prompt.

So why then the errors when it comes to facts and fingers? To understand this we need to look at the differences between how humans and how machines learn.

ChatGPT is trained on large amounts of text, but it is trained on text only. It can pick up what a grammatically correct sentence looks like and even what the rhyme scheme of a sonnet is. It is a language model - a good approximation of the structures we use to communicate with words. But ChatGPT has no access to the world outside of language. We as humans have an understanding of what a correct sentence looks like, too. But humans learn and use language by interacting with the world around them. Additionally to our language model we have a world model that encompasses knowledge of the world around us, e.g. that things always fall downwards, that an object can only be at one place at a time and that my home town does not have a TV tower or an annual sausage race. In our brains the language model and the world model are connected, so that we can say sentences that are not only grammatically correct but also factually true. That is something that ChatPG, by design, can never attain, and why it will produce sentences that are grammatically correct, but factually wrong.

And a similar thing holds true for image generation from prompts. When we look at a two-dimensional picture, our world model helps us to reconstruct a three-dimensional scene in our mind. We know that things can continue behind objects, that things that are farther away are not actually smaller and at what angle the arm meets the body. But the only representation the AI gets are the pixels of the image and the words that describe them. Imagine sitting in a dark room, matching various combinations of colourful mosaic tiles to sentences in a language that you do not understand.

Sure, you can learn which patterns are similar to which words, but you have no knowledge of what is supposed to be represented. All you do is to match words and tiles. And sometimes your guesses will be wrong, because you don't know what a finger or a hand is, you just map it to the mosaic pattern that is similar enough to the examples you’ve seen.

With machine learning, more examples aka more training data means better results. Potentially, the big companies that create AI systems could try to use all the data in the world to create better systems. So adding more data might help to make some improvements like better representation of hands, but will not solve these very fundamental problems. Only a change in system design could address the issue of a missing world model.

I wish I could promise that no artist will be replaced by an AI, but unfortunately the reality is different. But maybe the knowledge that AI can't actually refer to anything in the real world counts as an argument for the necessity of human oversight. AI can neither depict nor describe and whatever is generated by AI only receives its meaning from the people who interact with it. If art is about reaching out into the world and communicating, why would anyone be interested in a message that is only a weighted average of what was said before? Why would anyone read something that no one bothered to write?

I for one am still curious to read, watch and listen to art that is being put into the world by humans.

Want to know how AI could help your marketing campaigns? Contact us at Mobius!

If you'd like to keep up to date with all our blog posts, important and interesting stories in the worlds of theatre, arts and media, plus job ads and opportunities from our industry friends, sign up to our daily media briefing at this link.

What we do
Contact us to discuss your next project