12 Comments
Oct 5, 2022Liked by Eryk Salvaggio

Note - this article makes the assumption that Dall-E 2 uses the LAION 5B dataset and that therefore the base dataset is searchable.

To my knowledge this is not the case - the Dall-E 2 dataset is internal to OpenAI and has not been publicly divulged. LAION is independent of that.

The other pitfall is that using something like https://haveibeentrained.com likely uses CLIP embeddings (though they give very little info) to identify and pick images, i.e. another image recognition model with its own biases. A different prompt can surface more 'real' kisses, such as "phot of a kiss".

A lot of this is valid, but I think an oversimplification - because you really have the interaction of multiple AI-generated models, one that interprets the semantic content of your phrase, and another one that turns that semantic content into an image.

Because 'humans kissing' is a much colder and more technical description of the image, you get much more awkward kisses out of that. Normal people's intimate photographs will not have been labelled 'humans kissing', so you're more likely to get those simply asking for 'photo of a kiss' or similar - likewise in Dall-E, you will get vastly different vibes between those two prompts.

Interestingly 'photo of a kiss between two women' does not trigger content warnings for me either, and is generated without issue, which says a lot as to the vastness of semantic difference between 'kissing' and 'a kiss', as well as the clinical usage of 'humans' versus omitting that altogether (what else is Dall-E gonna do, show us chimps kissing?)

Expand full comment
author

I've adjusted the text to reflect your correction and I'm happy to acknowledge your contribution if a revised version of this ends up somewhere else. Thanks!

Expand full comment
author

Yes, it's an oversimplification - I am trying to outline a method at a high level, using these prompts as an exercise. My argument is only that this is an approach, not that the findings are in-depth.

You're correct that I collapsed the LAION dataset with DALLE's, in my haste to provide samples of training data sets. I'll make it more explicit. Of course, DALLE is proprietary. Thanks for flagging that for clarity.

Finally, I think your point on the prompts is an excellent example of the kinds of questions we might ask that the method supports. So this question / interrogation of prompts is also part of the interaction between the human and the dataset, another pathway for bias to be introduced. By far not the only one.

Again, this post is intended to be an early sketch of the types of questions and methods we can ask / use to investigate these systems.

Expand full comment
Oct 2, 2022Liked by Eryk Salvaggio

This is a terrific post, and I'm going to add it to my Theory of Knowledge unit on what AI can teach us about how technology shapes our knowledge. Thank you.

Expand full comment
author

Thank you, and thanks for letting me know!

Expand full comment

A clarification, please!

In "1. Create a Sample Set" the method is unclear. You generate a bunch of pictures, ok, I am with you. Then it sounds like you pick out "notable" images manually from a larger pile of generated pictures, and use that "notable" set as the sample you perform the first stage of analysis on? Or have I got this wrong?

Expand full comment
author

Hi! No problem.

In DALLE2, you enter a prompt and get four images. If any of those images are notable — ie, have a similar striking value compared to the object of your original question — save it to a folder. If not, don’t. Either way, whether you have saved an image or not, run the exact same prompt again. See if any strike you as interesting. Save them. Repeat.

When you have nine, or more (I often use many more!) you can put them in a grid (that’s not necessary — just put them in a place where you can look at them in association with each other). Then you can start asking questions about them.

You would later start applying your hypothesis more objectively to random images, as described later in the post.

Hope that helps!

Expand full comment

Yes! Thank you.

Does this not create a problem in that your hypotheses are then built on your own human biases as well as on biases present in the images you're looking at? Might you then spend the next several steps, in part, merely confirming your own biases?

Expand full comment
author

It’s useful to consider the phase that this work takes place in. You’re using it to generate a hypothesis and find ways in to the interpretation of an image. There is really no way to “eradicate your own biases” in analysis work, so rather, the awareness of those biases needs to be front and center in your mind as you do the work. That’s beyond scope for the article but definitely something I am always thinking about.

But in the end, once you have come up with a hypothesis about what it entails, your best bet is to treat it with academic humility. It is meant as a tool for opening up questions that you can explore and formulate theories. After that, there are a range of approaches (qualitative or quantitative) that you can apply to test your theories. But this will be different depending on how you frame your thesis, what you want to answer, and the methods you want to use. You can definitely check out other writing about media analysis methods to get at that question afterward.

But yes, it’s as biased as any other research question in that something will spark your interest and you will move toward exploring why. Being aware of and accounting for those biases as best as you can is part of the process.

Expand full comment

Got it, thank you!

Expand full comment

I tried the same on an uncensored version of Stable Diffusion...

Curious to know what you think about it

https://olivierauber.medium.com/le-baiser-artificiel-f2c60f48926b

Expand full comment

And sometimes you an image just had a random idea.

Expand full comment