This week I was reminded of the Stafford Beer adage that “the purpose of a system is what it does.” I referenced it in a longer piece this week published at Siegel, tackling the challenge of artificial intelligence from a multidimensional lens: understanding the emergence of “intelligence” in AI is only possible in the context of social, physical and digital infrastructures — and this includes the way they shape and deploy their own power. (I hope you’ll read it!)
I really do hope you’ll read that, but if you want to hang out here, I’ve got more to say about something else that applies to Stafford’s adage.
This week saw the launch of a new text generating technology from Meta, Facebook’s parent company. Galactica was introduced in its white paper as “a large language model that can store, combine and reason about scientific knowledge” (emphasis mine). The authors note that Galactica was trained on “48 million papers, textbooks and lecture notes, millions of compounds and proteins, scientific websites, encyclopedias” and more.
The purpose of Galactica, as they state it, is to automate the search for scientific knowledge. They suggest that search engines today “require costly human contributions,” ie, human vetting (such as Google’s reliance on Wikipedia, and Wikipedia’s volunteers). They suggest that a machine could assimilate this information by curating high-quality research data into its training model.
It’s no surprise that the outputs of Galactica have learned something from this massive research paper training set. They look like academic research papers, complete with citations, though these citations are not always accurate (and sometimes don’t exist). They are structured like academic research papers, and read authoritatively, using convincing but inaccurate understandings of scientific and research concepts.
Large Language Models (LLMs) are text prediction engines, and Galactica promised to do something more than autocorrect: it assigned tokens to ideas, in the hopes that those ideas (facts) would be preserved and reflected in the output. But like any LLM, the model has no idea about context: it only knows word associations, and the frequencies with which they occur. With 48 million papers, a machine was in charge of identifying and labeling these “facts” rather than a human being. So the context of those facts was lost in a sea of weighted word sequences.
If you recall the challenge of generating AI images of human hands — there is no sense of what a hand is supposed to be, or look like. So AI-generated hands get crushed, grow extra fingers, extend in strange directions. Galactica is like a DALLE2 for research, and the facts grew in strange directions, too.
In the conclusion of Meta’s Galactica paper, the writers are quick to point out that “[w]hile Galactica absorbs broad societal knowledge through sources such as Wikipedia — [e.g. it knows] Kota Kinabalu is the capital of Malaysia’s Sabah state- we would not advise using it for tasks that require this type of knowledge as this is not the intended use-case.”
Which is all fine and good, you might think, except that they then launched this model on a publicly accessible website for anyone to use, anyway. And if generating knowledge wasn’t the use case — then what the hell was the use case?
The results were predictable.
Other outputs included an article about the VX nerve agent that failed to identify that it was deadly to humans — it warned only that it had “long lasting effects on aquatic wildlife,” and an entire, less dangerous but certainly silly article about bears in space that had some surprising findings.
And so we might turn, briefly, to the Stafford Beer adage, and propose a simple test for Galactica, and perhaps for all AI demonstrations: The “The Purpose of a System is What it Does” Test.
We know that the “purpose” they ascribe to it, by their own admission, is not something the system does. In their mission, they state the purpose of Galactica is to solve a problem: “Researchers are buried under a mass of papers, increasingly unable to distinguish between the meaningful and the inconsequential.” Galactica’s purpose is to help.
Sounds great, but they don’t get there. They acknowledge that the AI “hallucinates,” a fancy way of saying it generates incorrect information. So if Meta knows that the system isn’t fit for purpose: it can’t “store, combine and reason about scientific knowledge” and it accelerates the challenge of distinguishing meaningful and inconsequential content — then this leaves me with a question:
What does Galactica do?
It allows the user to type a short prompt on any topic.
It generates “confident but wrong” research papers from that prompt.
It does so quickly: a single author could generate many papers in an hour.
It does so anonymously: the content can be copied and redistributed without any acknowledgement that it came from an experimental AI system.
A: Galactica is a tool for users to type a short prompt on any topic, and quickly generate as many inaccurate but convincing research papers as they want, and distribute those papers to anyone they choose.
In other words, Galactica is a misinformation engine.
To its credit, Meta accepted the risks and shut the demo down after just three days, thanks to tech critics pointing out the possible abuses. And I could go much further down the rabbit hole about where we are in the state of AI vs where tech companies want us to believe we are — and how a general absence of AI literacy lets them get away with blatant marketing hype disguised as research.
But I’d like to show more restraint and leave it at this: The more invested you are into a system of any kind, looking at it through the lens of its intended purpose, the more difficult it is to see what it actually does. The frame of good intentions obscures impacts outside of the lens and purpose you initially ascribe to it.
In Defense of Criticism
There is a tendency in the AI development space to reject technology critics as killjoys getting in the way of technological progress. That critique is the battle cry of those who allow good intentions to define good outcomes, regardless of the reality.
I tell my undergraduates that the purpose of critique is to understand whether or not the thing in their heads made it out in a way that other people could understand or use. Designers and engineers — and critics, too — would do well to keep this in mind. Critiquing AI is not about dunks. It’s about letting those with the power to build this technology see how their intentions manifest into real action.
So when the head of Meta’s AI team complains that this AI was removed from public use, arguing that there has been no documented case of harm, it’s bewildering. The value of critique is to highlight conceivable harms before real disasters highlight them for you.
At the moment there are three strands of backlash against AI ethics, as I see it.
The first is that it is whitewashing for corporate interests — look at who funds the biggest consortiums, and you’ll see a list of the biggest AI companies and the rap sheets that come with them. It’s challenging to view these products from the outside when you are invited in, and big tech can block the publication of results it doesn’t like.
The second strand is that AI ethics as a field is too concerned with hypothetical harms at the expense of technological advancement: a group of woke scolds with regulatory power who need to get out of the way of innovation. This group believes that innovation will solve the social problems that underly the products they build, if only we get to deploy and calibrate them through use.
The third strand is that power is not embedded in the ethics made by elites, but in the development of just tools, that can serve communities instead of profits. This view sees ethics as a code of conduct aimed at the bare minimum: reducing harm, rather than increasing justice. The criticism is that agreements hashed out between parties that don’t represent the potentially harmed will never be appropriate.
Ultimately all of these come down to the same adage: that the purpose of a system is what it does. Healthy systems have healthy outcomes. Unhealthy systems have unhealthy ones. And if you aren’t sure which one your system is, look to what it produces: not at what it is meant to do.
Twitter Alternatives!
Thanks for reading this week. Quickly wanted to share that, if you’re concerned about the trajectory of Twitter, you can find me on other platforms too. Links below! And that includes a new Instagram account exclusively for this newsletter and very tightly related work and projects. Hope you’ll find me over there!