The artist of my memory is crafting hazy recollections into non-existent eyes. How do we get the pictures into the mind? Our eyes are an interface. It’s more than an eye, though, isn’t it? It’s a filter — a filter shaped like a body. A filter in the shape of us.
I am not sure that everybody looks at the eyes. I’m fascinated by looking at AI images and the slow deterioration of plausibility that occurs from holding a gaze. Of course, the gaze is also directed toward us, in how AI is built on a backbone of surveillance. Look at these images, and you see eyes but look long enough, and you see unnatural breaks.
This is a sequence from a larger work in progress, Human Movie (so named to match the trend line of everything being assigned “human” as a modifier for things only humans can do: human writing, human art, etc. The film revolves around fragments of text I’ve written, a series of short meditations on compression algorithms. I’m trying to think through the metaphors of artificial intelligence systems as they reflect human thought, memory, and experience.
Often, these metaphors are invoked without attention to themselves as metaphors. This is unfortunate because metaphors are the vehicles of poetic and associative thought. The poetry of the AI system is hard to get at. It requires a blend of ironic distance to find the overlap between the machine and the mind. We have to ask what it means to call these copies of the mind, to evoke the distinctions, the difference that emotion makes to experience. So I wanted to make a film exploring artificial intelligence exclusively through its metaphors, to recenter the human at the heart of them.
The non-existent eyes are the images we write into our heads when we recall past events. A Filter In the Shape of Us is the second part of a segment:
Are we artists of memory, crafting hazy recollections into nonexistent eyes? I've heard that memory is reconstructed, not recalled... that we rewrite the past onto the mind rather than opening them up like files in a lost hard drive. Could it be true? That every memory is as false as a photograph?
So, this is a work in progress. If you’re reading and would like to discuss showing this piece or commissioning the larger work (or other work), I’m happy to talk!
The Stubborn Myths of Generative AI
Thrilled about the response to my piece in Tech Policy Press, addressing the myths surrounding generative AI and the hold they have on our collective thinking. If you haven’t read it, it’s the most thorough thing I’ve written this year — I hope you’ll check it out. And, as always, if you found it compelling, please circulate it!
Re: LAION
LAION 5B, the dataset behind Stable Diffusion and other open-source image generation models, was taken offline late last year after Stanford researcher David Thiel found links to banned images of child abuse in the training data, which I wrote about in Tech Policy Press earlier this year. Now, LAION has released an updated version of the dataset without links to that material, thanks to the time spent by watchdogs including the Internet Watch Foundation and the Canadian Center for Child Protection.
To be clear, I’m grateful for open source datasets and models because it allows me to do my research into the cultural logics of AI systems. If LAION wasn’t open and auditable, we’d never know this content was embedded into generative AI. But it’s more complex than that, too.
It rubs me the wrong way to see LAION complaining, in this announcement of the new dataset, that they found out about the presence of abuse imagery via the press rather than hearing from researchers first.
Here’s the thing: you should know what’s in your datasets. If they could find that content today, they could have found it two years ago. Those partners should have been spoken with first. It is *absurd* to think you could scrape 5 billion images randomly from the web and not end up with something awful.
It was also not the first suggestion that this type of content might be present in the training data. Abeba Birhane et al.’s work from 2021 (!) had previously reported a range of violent material in the dataset that had slipped through their filter system — the first red flag that whatever they were doing wasn’t working.
If it had been taken seriously by LAION, they could have addressed it sooner. They didn’t. That’s on them. I am further annoyed to see them minimize the impact of redistributing links to child abuse imagery by referring to it as a fraction of the overall dataset instead of what it was: 2,236 links to CSAM, as well as personal data that could be used to identify information about other children with images in the dataset.
Kudos to resolving the problem, but shipping off the responsibility for auditing your own dataset to third parties and child protection agencies and then making demands on them to do it differently is a deeply unserious and irresponsible position. It’s clear that the organization behind this isn’t concerned or capable of understanding the most important issues surrounding the dataset they’re building.
AI DOOM, But Not Like That
A new paper for a new kind of “game engine” (but not really) dropped this week. It’s compelling: a diffusion model that can generate frames of the old-school video game Doom. Not only could it generate the frames, but it could also generate frames in response to a player's movement and remember world states (did you kill an enemy, find a key, etc.).
At first, I was skeptical because, of course, I was. Despite the hype people may be pitching about this, the paper is pretty straightforward in its claims, and the model does seem to do what the authors say. So, what do they say—as opposed to what the hype says?
Important caveats: I haven’t touched the controllers of the DOOM model, and the images above are likely cherry-picked for stability and reliability.
Generating video of a game world isn’t so hard. However, creating a playable, interactive map that people can interact with is somewhat novel. It’s important to note that what they’ve built is on top of Stable Diffusion, and it is highly constrained to get the results we see here. It’s not an example of the “emergence” of some new capacity within a diffusion model. Instead, it’s leveraging diffusion to render a set of images on the fly. The specific innovation is tapping into an LSTM (Long Short-Term Memory), which operates as a buffer filled with previously rendered frames.
A few things about the hype here, though. First, the model had to be trained on an existing game. As a proof of concept, the result is fascinating. As a replacement for game devs, it’s still fuzzy: it’s a game generation engine that requires you to build a playable game and then allows you to extend the visual world of that game. While Doom is complex compared to other video games, it won’t compete with complex role-playing games or story-driven puzzles. It works for extending dungeons in an existing style or making mazes for players.
Notably, though, it works by replicating sequences in a latent space. So, for example, we might be tempted to assume that we cannot move through a wall because the engine has some secret understanding of space. We cannot move through a wall more because the most likely sequence of moving into a wall is a repeated image of a wall. It’s a great example of the mental models we make of space (in a video game or elsewhere) and how we apply it to simulations spawned by generative AI.
Nonetheless, I am interested in and somewhat impressed by this capacity—based solely on what I’ve read in the paper and assuming that it is accurate. Hopefully, we can test the outcome ourselves sometime soon.
Things I am Doing This Month
I’ll be speaking at the Gray Area Festival on September 12-15, alongside Lynn Hershman Leeson, Rashaad Newsome, Casey Reas, Trevor Paglen, Lauren Lee McCarthy, Ranu Mukherjee, Morehshin Allahyari, and Victoria Ivanova!
You can find more info and buy tickets through Gray Area, which also has a wildly generous collection of past talks.