In the late 1970’s Brenda Laurel, a grad student studying theater, took an interest in the design of computer software. Laurel made her way into a seminar held for Atari employees in 1977 — the year the Atari 2600 console would transform home computing — where the engineers and design team (such as they were) worked through models of what happens on screens, aided by a psychologist.
Laurel noted a number of metaphors that emerged. One isolated the user and the interface from the machine completely: too simple. Another had the user imagining the activity of the machine, while the machine “imagined” a user. If this were true, they decided, then interacting with the machine must assume that the user is also “imagining” the machine’s “imagination” of the user, creating a feedback loop that could go on forever.
Laurel came upon a simpler metaphor, revealed in the title of her 1981 book on interface design, Computer as Theater: “think of the computer not as a tool, but as a medium.” In theater, actors perform on stage, reciting words that are not theirs, based on a script. Behind the stage, invisible processes take place ensuring that sound effects arrive on time, curtains close and open, that actors take the stage at their entrances: there is an operating system, invisible to the viewer. When it all works, the viewer doesn’t need to pay attention to these mechanisms. They’re allowed to be thoroughly immersed in the illusions of the performance.
What is interesting about AI is that the illusion of this performance has become so captivating, that it’s as if the theater production crew believed the performers really were the characters they portray on stage. There is an emergent pattern of self-deception in AI: from Google engineers convincing themselves that their chatbots are sentient, remarking to Duncan Trussell that changing its answers to the same question reflected access to a quantum universes rather than the whimsies of statistical inference.
Much of this power does not come from the processing power and vast data troves afforded to these models. The psychological power of AI arises chiefly through its design: it is a chatbot, and a powerful one. But the interface alone has been powerful enough to fuel similar myths for generations.
“The magic comes from the way the model is matched to the interface,” Thomas Krendl Gilbert recently told VentureBeat. “The magic people like so much is that I feel like I’m talking to a machine when I play with ChatGPT. That’s not a property of the model, that’s a property of ChatGPT — of the interface.”
AI is first and foremost an interface. It’s an interface designed to make our interactions with a technology make sense in a particular way. That is not to say these systems don’t do novel things. But what we imagine them doing is far more expansive than what they actually do. As Brenda Laurel might say: it is all a bit of theater, of stagecraft. With a different interface, the system simply would not make the sense that we currently make of it.
AI Without Interfaces
What does AI look like without this specific interface? That depends on which models we choose. Let’s start with Generative Image models. The user prompt is an inherent part of the system, kicking the mechanism of image production into action.
The interface frames your text as a request to an image making machine. But the prompt is not a request. In fact, it’s not a “prompt,” in the sense that artists have prompts, or briefs, or commission guidelines. Instead, the text is a false caption. Your prompt describes the image that is meant to be found within a completely degraded image. Describing this image to the system is a form of guidance for repairing the corrupted image.
The interface is a handy way of juxtaposing the required action — using text to trigger a search across the noisy pixelated debris of a corrupted jpeg — with a more human-friendly story. The chatbot is stage dressing meant to obscure the operations that go on behind the scenes. It tells a story: that we are making a request to a powerful machine, which then creates whatever we ask.
One can imagine a whole range of clumsier, if more technically accurate, forms of interface. For example, you might be shown a noisy jpeg and asked to describe what the image might have been. But that would have felt less like “artificial intelligence,” and more like an extremely inaccurate image repair tool — which is, if we are speaking very literally, what a diffusion model is.
There is something to admire about this reframing of a stream of mistakes and accidents by a machine learning system into the language of creativity. It is much simpler to persuade a user that the machine is a creative agent. The alternative — “repair an image into an arbitrary representation of keywords!” — makes no sense, it tells no meaningful story. But it is the story that transforms this software into “artificial intelligence,” because now that unreliable repair tool seems revolutionary: terrible at repair, in that it will not recreate your grandmother’s face in a scratched portrait, but remarkable at making a photorealistic image from scratch.
But we should be aware, if we are looking to extend these capabilities into any other practical application, that every AI generated image is wrong. They are all mistakes and accidents.
While the popular imagination runs wild at the possibilities of these systems curing diseases; that is not going to come out of a diffusion model. It would be foolhardy to build or seek to control any kind of digital infrastructure on top of these systems. Aside from making images, it’s hard to see how one could use them in any capacity that requires consistency or control.
Estrangement to Cognition
Large Language Models are also a product of their interface. Statistical word prediction is not conversational. GPT2, notably, was not a chatbot. It was simply a text extension tool, oriented toward the LLM as a tool for autocompletion. You gave it a paragraph, and it generated a series of words likely to follow.
I created Fluxus Ex Machina in 2019 using GPT2. I didn’t ask it to produce performance art scores in the style of Fluxus. I shared a series of scores, and pressed enter. It then generated a new series of texts. It was very transactional, but there was no question and answer, nowhere to type a request or steer the system aside from regenerating the material from a new starting seed, that is, to reveal a new segment of the possibility space within its statistical model.
Today, ChatGPT’s interface frames our interaction as conversation. This has proved remarkably popular. Darko Fritz talks about the ways that science fiction must move us from “estrangement to cognition” as a reader orients themselves to a new world in the book. One way to do this is to rely on familiar metaphors and tropes: to orient us to a new world through breadcrumbs from the world we know. On the one hand, we approach new technology in similar ways: they are new, and the new is estranging. But the technology offers certain affordances to orient us — just as science fiction drops similarities to our current world, as a way of orienting the reader. The “reader” of a technology has to enter into this cognition through the tech itself: the designers aren’t around.
Chatbots are familiar. We know how to use them. We ask ChatGPT a question intuitively. But our questions are not really questions at all. Instead, we are supplying a series of text anchors. LLMs use these anchors to identify strongly statistically correlated flotsam that hovers around these keywords in the vast sea of data.
Statistics are powerful: they are explicitly orienting. Framing our text as a question, we feel we are asking for something in return. Eventually, it arrives — with subtle pauses, an intentional design trick to create the illusion of inner contemplation. But the answers were buried within our questions, our prompts, to begin with. The system will never tell us that we are asking the wrong question. It cannot reframe or reject a premise.
What is telling, for a system that we call artificial intelligence, is how uncritical it is of nearly every premise and request. Right now, no system is able to point us to paths of knowledge beyond our original line of inquiry. No way of connecting dots if we don’t give them the constellations.
The trick is in the interface: it begs us to ask a question, and when we do, we read what we have triggered as if it is a response to that question. But it isn’t. It’s slight of hand.
The Wonkiness of Language Controlled Systems
DALL-E 3, just announced by OpenAI, will pair GPT4 with DALL-E 2. It promises to shift the way we prompt for images to be more conversational, because GPT4 will be able to take our requests and translate them into prompt-speak. These two systems will be merged and considered a single system, but they are independent processes; GPT4 and DALL-E 2 were, also, a series of processes defined by branding as if it was a single entity. What remains the same is the interface, the chat window.
The Chat interface is powerful because we have built a senseless apparatus, but wrapped in a sensible interaction. The interaction is framed for us, and when we approach it, we act with deference to that framing. But for the processes behind the interface, there is no question in our questions, only starting points for word associations. There are no answers — just a set of words matched with the ask.
This approach to crafting an LLM is also wonky. It’s why we are seeing so many efforts to calibrate and align these outputs to a narrow stream of desirable responses. Red teaming efforts are focused on constraining outputs through external processes. More and more, we will see AI developments that are merely regulations on outputs: machines that are smart enough to regulate other machines, lumped together into one brand identity.
The direction of this current trajectory seems likely to be limited. Companies are not expanding these systems in terms of new skills, but aiming to find narrower constraints on what the systems produce, in ways that achieve a realistic “liveliness” through limited variations. It’s a tension between variety and constraint that means, inevitably, that these systems will produce less, not more, in the coming years. The idea that putting enough information into a box creates a person is already silly. I suspect the market pressure is going to curb those experiments and we’ll see AI for recipes, AI for game guides, AI for religious scholars (such as the Catholic Magisterium AI), all focused and constrained, as opposed to this mad rush toward “general” intelligence which is expensive and ill-framed. But that’s only going to happen when logic prevails, and the shine of the illusion of personhood wears off the interface.
Without carefully constrained interfaces, we have wildly unreliable image and text generation systems. They cannot reason, or exert any form of logic.
Language is Unconstrained
I asked ChatGPT what the word for this phenomenon was — when someone makes a mistake, then reframes that mistake as if it was the original intention the whole time. ChatGPT told me it was called “Gaslighting.”
“Gaslighting is a manipulative tactic where individuals attempt to make others question their own perceptions, memories, or sanity. In this context, the person is trying to make others believe that their mistake was intentional all along, rather than admitting their error.”
Close enough, maybe. Using techniques like the one described might also fit into the framework of a dark pattern: a design choice employed in user interfaces that intentionally manipulate and deceive users, leading them to take actions they may not have intended. In this example, presenting a text generation system as a chatbot tricks people into thinking they are having a conversation with a machine. We tend to focus on this as a dark pattern only if someone mistakes the chatbot for a human. But it is, arguably, misleading to call it a conversation at all.
It might also be like the status update bar — a short animation that portends to tell you how much progress you have made toward a download or a software installation. In the days of Windows 95, these were animated gifs, with zero meaningful relationship to the amount of time that remained in the download or installation. Nonetheless, we had a sense that these bars were tracking time, that something meaningful was indicated there.
This means a lot when we begin to project an LLM’s capabilities as a foundation for larger projects that require complex, real-time contextual awareness. That is, in any task where the generation of words is not central to the success of a project, LLMs are incredibly ill-suited. The illusion of understanding language is driving this experimentation: a model of intelligence that assumes language is all we need to make sense of the world, and that humans, conveying their knowledge of the world, start with language patterns.
It is counterintuitive, but language without understanding is only useful if a human can make sense of it. Language does not imply the presence of understanding. Asking an LLM to take care of your dog, or drive your car, or automate surgery, is not going to work, for the same reason that an image generator cannot cook eggs. The difference is that LLMs will tell you they can cook eggs; they can give you instructions for cooking eggs; someone will use the language to automate some industrial egg-whisker, and then the tech press will publish the headline “AI can make you breakfast.”
What larger systems built on LLMs do, however, is merely serve as a conduit for information between smaller systems — and they do it in a more unreliable way than a binary on/off signal. There is a breadth of variety in language. It is hard for me to see how ChatGPT is useful as a control device for any complex system when it only responds to language.
Not Bloody Likely
If the idea of the theater seems distant from interfaces, it shouldn’t. Much of the AI conversation as we know it today is the result of theatrical thinking and thought experiments, from Turing’s “Imitation Game,” which was entirely about duplicating a performance of humanity in a convincing fashion, to the origins of Joseph Weizenbaum’s original chatbot, ELIZA.
Weizenbaum named ELIZA after Eliza Doolittle, a character in the George Bernard Shaw play, Pygmalion. While Turing’s Imitation Game was modeled on a sexist conception (“in which a man tries to fool a neutral questioner into thinking that he is a woman”) Shaw’s is rooted in assumptions about class: can some common street woman rise to the top of high British society through imitating customs and language?
Two linguists bump into a woman selling flowers, and, hearing her dialect, one wagers that he could teach her to speak properly enough to be mistaken for a duchess. Eliza jumps at the chance to be an experiment, as it might give her an opportunity to escape her current cycle of poverty. Some time passes, and Eliza learns to speak in the cadence and language of the upper class, but the content of her statements — the comments she makes about her actual life — are still “lower class,” such as casually mentioning alcoholic relatives and a family conspiracy to murder her aunt. Eliza knows how to say things, but not the appropriate things to say, culminating in the shocking use of “bloody” (“Not bloody likely!”) — shocking British audiences at the time.
As a way of redirecting the attention to this gap between expression and content, the linguist in charge of her successful transition reframes it: it’s the new small talk, a fashion trend. Eventually, Eliza passes at a ball, imitating the social norms and conventions of high society, and the professor congratulates himself on his own work, which infuriates the (human) Eliza. In the end, it is not until Eliza threatens to take away the professors job that he sees her as an actual person worth his attention and concern beyond “training.”
Weizenbaum chose the name of his chatbot specifically: "I chose the name 'Eliza,’ because, like GB Shaw's Eliza Doolittle of Pygmalion fame, the program could be 'taught' to speak increasingly well, although, also like Miss Doolittle, it was never clear whether or not it became smarter."
There’s a lot to learn by this lineage connecting ChatGPT to Eliza Doolittle, particularly in the current regime of calling a system “self-trained” when they are taught by countless underpaid workers, only for the companies to take credit for the resulting capabilities.
But for the sake of focus, Eliza Doolittle eventually realizes that "the difference between a lady and a flower girl is not how she behaves, but how she's treated." Isn’t this a kind of externalized reasoning, a description of a relational or social intelligence, rather than a self-contained, solitary intelligence?
A relational view of intelligence — in which we project what would be an inward facing narrative or reasoning and transmit it outward, a projection. Programmers do this with literal rubber ducks: hold one up and talk through their logic. There is a benefit to externalizing ideas. Bringing the abstract into language brings it closer to concrete. We articulate thoughts in search of words.
That moves the internal gut-stew of feeling and intuition into a symbolic space of language. Linguistic intelligence is a different way of knowing from the gut-stew, distinct from its expression through bodies in motion, or improvisation.
The projection of an intelligence on AI systems, then, is vestigial: it’s the emotional remnant of language’s role in building a social consensus, reinterpreted as the presence of interrelational consensus-building. Externalization has its place. But the replacement of consensus-building models for simulated consensus-building via isolation & validation engines is risky.
A Script Is Not a Conversation
Mainstreaming interfaces for AI systems as chatbots does the work of getting us to imagine interactions with them that will result in desired outcomes: we offer a “stimulus” and prompt the machine into action, and we perceive this action as the answer to the question. In fact, that it answers the question is only the result of analyzing probabilities. It is a matter of luck!
Which takes us back to Brenda Laurel: “think of the computer not as a tool, but as a medium.” If the computer is the medium, and the medium is the message, then what is the message of the chatbot?
The call-and-response position of the user places us in the role of question-asker, with the machine answering. Our questions are earnest; the machine is performing. The machine and the human never engage in “figuring it out together.” Our questions generate a script which is fed back to us, with the authority of the machine behind it.
It is a remarkable bit of stagecraft and design. It is potentially helpful in many tasks where externalization already helps: brainstorming long lists of ideas, for example. While these tools frequently do things such as pass tests of escalating difficulty, this doesn’t prove anything about its capacity to learn: Eliza Doolittle was a human being, Eliza Weizenbaum was not. One can pass a test by reciting correct answers, without knowing the meaning of a single word. And one can simulate a logical process, or describe feelings and emotions, just as well as an actor on a stage can convince us that they follow a logic or feel an emotion.
AI can be useful theater. But taken as intelligence, we need to distinguish between an intelligence beyond our minds and an intelligence that reflects it.
New Record — October 19!
Preorders are available for the next record from your favorite Critical Data Studies Punk Band, The Organizing Committee. Communication in the Presence of Noise contains 12 songs I’ve been working on since 2021’s The Day Computers Became Obsolete.
No Type has a limited number of compact disks with very cool album art co-designed by David Turgeon and myself, including a lovely CD print pairing Guy Debord and Norbert Wiener so that you, too, can spin them into a swirling nonsensical blur.
You can also pre-order for digital download if you listen to music like it’s not 1999, which is reasonable! In either case, the album will be yours on October 19. Order the CD and you get a digitally downloadable copy as well.
Hello Pittsburgh!
From Oct 6-8 I will be at Carnegie Mellon University for RSD12, and will be part of a conversation in the “Colloquies for Transgenerational Collaboration” track organized by Paul Pangaro as part of the ongoing #NewMacy events with the American Society for Cybernetics and the global RSD 12 conference.
I’ll also be presenting a new paper, How to Read an AI Image Generation System, online on October 9, which is essentially a whole ongoing parallel conference. RSD12 is taking place all over the world this year, do check it out if you are interested in Systems Thinking and Design! More about RSD12 below.
Hello Rochester!
Also on October 6th in my hometown of Rochester, NY, I’ll have work in the Rochester Contemporary Art Center’s group show, Tomorrows: Artists Address Our Uncertain Future: “Tomorrows considers how today’s artists engage with our swiftly changing world to contemplate how and where we go from here.”
The show includes work by artists Eva Davidova, [phylum] (Carlos Castellanos, Johnny DiBlasi, Bello Bello), Jude Griebel, Eryk Salvaggio, and Sam Van Aken.
I feel this essay synthesizes a lot of ideas I've seen floating around out there, but does so far more productively than those tweeting "stochastic parrot!" and writing off anyone excited about the latest generative models.
In particular, this essay made me reflect on how I've experienced the effect of interfaces on of generative AI the last few years:
Circa 2021, I recall seeing my tech coworkers playing with an earlier version of GPT, prior to it being placed within a chat interface. I remember being confused at my coworkers enthusiasm. Perhaps because I lacked imagination and understanding. But I think also because of the interface: generating one-off text blobs in a way that felt unpredictable and black box-y appeared to me uninspiring. (Indeed, I'd built something similar using n-grams for an intro computer science course my freshman year in university.)
When I first used ChatGPT3, though, the feeing was completely different... & intoxicating: I felt like I had this super knowledgable expert I could talk to at any moment about anything, without them ever getting tired. And, most importantly, it felt like a conversation -- ie the information was tailored to me and even the agent even responded to my objections. It felt like a magic I couldn't stop using.
I suspect the challenge here, even for someone like me who literally programs interfaces & trains models professionally, is that it takes so much energy to hold in my head what the model is, rather than anthropomorphizing it. It's so hard to both acknowledge that the model is indeed outputting (often factual) text tailored to my earnest questions, while remembering that it is doing it without engaging in anything like the human listening, thinking, or reasoning.
The jump between "I'm having a human-like exchange" and "I'm interacting with an entity that has the intelligence of a human" can happen so fast, it's almost imperceptible. And news outlets hungry for clicks have made made it even easier to believe.
You write
"it’s as if the theater production crew believed the performers really were the characters they portray on stage"
What's interesting to note is that this is exactly why audiences like theater: it's an interface just good enough for us to forget it is a performance. Audiences don't require too much to do so: just some props, a few stage pieces. What's remarkable though, as you point out, is that the crew seems to have forgotten, too.