Thoughts About a Mouse

Jan 7, 2024

The Limits of Copyright in Shaping AI

9 Comments

Jan 7, 2024

I believe that copyright not only is insufficient to improve the situation, but that it will greatly worsen it. It does not stop being a tool at the service of companies just because it is now inconvenient for some big tech. There are other large corporations for which it is convenient, so the game remains at the upper levels.

The idea of appealing to an even stricter copyright is harmful for practical reasons. A realistic financial compensation for those who produce images or texts is indeed unthinkable. Since billions of images or texts are needed to develop these technologies, the contribution of individuals is minimal and it's unimaginable to think of paying them even just one euro each, as it would make the already high initial investment impossible. Data would therefore be paid a very little, and only those who already own it in large quantities would benefit – for example, Shutterstock is paying creators something like 0.01 per image used for training.

The alternative is therefore that everyone refuses to provide their data, making these technologies much less effective (such as Adobe Firefly, which uses only proprietary images). Even if we consider this scenario commendable or plausible, it presents serious criticalities.

The first point is that artists who are not already well-established have no bargaining power. Just as they accept relatively low royalties and minimal or non-existent advances, they will also accept a training clause, where necessary. When considering laws, their practical effectiveness, not just their theoretical basis, should be evaluated.

The second point is that even these 'weaker' AIs have a significant impact on the creative market, and would only be considered harmless for the most original works – the only ones that, in my opinion, are not at risk anyway. The bulk of the creative market and the related earnings of those working in it are not in works of art, but in commercial works made according to client requests: all things that AIs built on proprietary datasets can do very well. Whatever the impact on work, it will be identical, but it will prevent artists without deep pockets from using AIs to their full potential because they will always have to rely on proprietary and unmodifiable software.

The third point is the impact on open-source technology. Currently, it's possible to customize open-source technologies with fine-tuning, but only if the use of data for training remains free. A paid dataset would also further reduce the variety of companies producing these technologies, due to the additional economic barrier. Like building these machines, or producing computers, cameras, and thousands of other common tools, significant investments are needed, which only private or public companies can afford. The higher the investment ceiling, the fewer companies can afford it.

The fourth point is a geopolitical reason. If we do not consider the use of data fair, we must take into account that in Japan it already is. If we do not consider AI-generated images as a human creative product, we must consider that in China they are legally regarded as such. On the other hand, we have various publishing giants demanding their share of profit, like the New York Times recently suing OpenAI. We're not talking about small authors with a book in the dataset, an utterly insignificant contribution compared to the size needed to train an LLM, but about publishing giants that can claim substantial economic demands from the mountain of rights they possess. In the West, I believe the direction will be the usual individualistic one, data as private property. But we do not live in a world of perpetual peace where all states can agree on such matters, but in a competitive context. The East has a more collectivist view and could consider data as a public good.

Then there's a social reason, which for me is the most important. The cognitive world of AIs is determined by data, and that's exactly why I want them to have mine as well as everyone else's, not just that decided on and/or purchased by companies. Otherwise, we will be even more limited and harmed by the cultural biases of AIs, in addition to our own. It's vital that the dataset be open and modifiable.

To conclude, if training were restricted by copyright, we would face the unprecedented scenario where pirate, Eastern or non-profit AIs would be more powerful than Western commercial ones. Would we really use the latter?

Expand full comment

Reply (1)

Eryk Salvaggio

Jan 7, 2024Edited

It seems you’re coming at it from the position that AI systems and their development matter more than my rights to my data. You’re focusing on the fact that my data is cheap and not worth it to me. You say it has little sway on the outcomes.

In that case, I encourage you to give your data away to the AI companies. Perhaps you can go out and take photographs specifically to help them achieve their goals. Get a nice camera, go out and take pictures of buses and stoplights and maybe some human hands while you’re at it. Upload them to a file sharing website and donate them to your favorite AI company.

Or does that sound ridiculous? Because you seem happy to ask me to assemble my work into datasets for AI training without compensation. Why do you get to decide that people are compelled to do that work for free? Why don’t I get to decide if I want mere pennies for my data, or more importantlt, to say no outright?

Apparently my data has some value after all, no?

You suggest AI models will use data to steer away from bias. I don’t think that will play out in practice. You can’t debias a diffusion model through data collection alone. The model is itself a reduction engine and will only create stereotypes of keywords in a prompt. You can train an entire dataset on photographs from India, and you will only recreate local stereotypes. This has been documented.

You also take the position that it’s up to me to help build AI systems that compete with Asia, like some war-bond buying poster from WW2. This is a weird thing to ask me to care about. I see little public benefit of these systems for artists or workers in the current mobilization of AI. If Big Tech companies want us to care about their success, they need to show the public good of their technology.

Anyway, you write assuming that my goal should be to build powerful AI systems. That’s not my goal. I do not care about that at all. I don’t see a single social problem that generative AI image making solves, so I don’t understand the rush to improve it.

If you want to convince me otherwise by showing some compelling real world evidence of something that only LLMs or Diffusion models can solve, I’m certainly open to hearing it. In fact, I am constantly looking for these examples because I would like to be more excited about this technology. But scratch the surface and it’s just Big Data (but even bigger) and surveillance capitalism (only even more capitalist).

Edit: You’ve edited your comment after my response.

Expand full comment

Reply (1)

Francesco D'Isa

Jan 7, 2024

No, my position is simply that copyright will neither stop nor slow down AI companies (if this is a positive and desirable thing) because companies will buy rights from other large companies without any profit for the multitude of people who have few – but collectively indispensable – data in the dataset. They will only become more monopolistic, because there will be an additional economic ceiling. Personally, I believe that data should be considered a public good and AI should be obligated to be open source precisely because they are founded on common goods. The alternative is large flows of money between companies (as has been the case *for years with copyright*), increasingly closed AIs (once the rights are bought they owe us nothing) and, if lucky, mere crumbs for others, following the Spotify model or worse. Opening up datasets won't solve stereotypes, I agree, but closing them off even more will only worsen them further. In my view, they should be transparent and freely modifiable. I just wanted to offer you an alternative perspective, not one of principle, but pragmatic: it's been years since copyright no longer serves the interests of artists, and there doesn't seem to be any evidence to suggest that things will change now. But at this point, what can I say: we'll see! I always read your work with interest (I'm citing your newsletter in my next book on AIs and Art) but on copyright, I have completely opposite views.

Expand full comment

Reply (1)

Eryk Salvaggio

Jan 7, 2024

I don’t really have strong views on copyright. I think I made it pretty clear here that I am conflicted. I acknowledge the limits of this approach in the subtitle. Rather, I think copyright is only useful when it is a tool for protecting against the encroachment of exploitation from artists. I have spelled out the arguments on both sides, including mocking the extension of copyright protection advocated for Disney. So perhaps I didn’t make that position clear enough. I don’t think stronger copyright laws are required. My entire argument is that copyright is a clumsy mechanism through which to do most of the stuff we need to do to build better protections for individuals. I absolutely reject the position that AI companies should have more access to all of our data. I think I should get to decide what AI companies see and obtain. Your position seems almost to make humans and their interests subservient to AI companies demand to build better models. I think that kind of extreme cyberlibertarianism has not resulted in quality systems.

Expand full comment

Reply (1)

Francesco D'Isa

Jan 7, 2024

To be precise, my position is that invoking copyright will only lead to more damage and monopolies, and that data should be available to everyone, but we should require companies to produce in open source, since they are using what I consider a common good (data) for their own benefit. And perhaps impose taxes on them that go towards the common good, since they are using one. I have no sympathy for tech companies; I recognize that building an AI can only be done by those with great means (even without copyright on data, and with copyright even more) but this does not justify them. If I had to express my ideal vision, it would be for AI to be open, transparent, modifiable and even public. I think that the fight of many artists (but especially many large companies that hold rights) in favor of copyright will lead to even more capitalist, monopolistic, and closed AIs. I understand your position and your doubts, which is why I wanted to suggest a different point of view - obviously, I too am not sure about anything, it's a very complex and continuously evolving issue. I didn't mean to come across as aggressive or intrusive, but it's a debate that interests me and, as you know, often divides even people who were previously politically aligned.

Expand full comment

Reply (1)

Eryk Salvaggio

Jan 7, 2024

Yes I think we’re in similar places, then. I am not so sure about openness and transparency: I don’t want my data included in those datasets and I sure don’t want anyone else looking at it. So then we have the issues with child abuse imagery in the dataset being recirculated. That’s “open” and “transparent” but also … circulates child abuse imagery. Meanwhile, I know that I will say less online and share work less openly if I feel I have no control over how it will be presented and documented.

I think you are in Europe, and you have actual data rights. The situation in the United States is the opposite. We have absolutely no protection. So in a sense, I live in the world I think you are imagining. It is far from perfect.

Expand full comment

Reply (1)

Francesco D'Isa

Jan 7, 2024

You're right about the datasets, but if we discovered that they contained images of abuse, it was precisely because the datasets were inspectable... not to mention that it means the data were online somewhere, and the primary abuse is in putting online data that we don't want to see spread (or that in this case shouldn't even exist). I find it a bit absurd to complain that, for example, a photo of me is used for training, which without association to my name cannot be reproduced by an AI, while the photo is publicly available to anyone... It's a different matter if it's used for deepfakes or similar, but those are already illegal (at least here in Europe). Anyway, nothing that copyright can resolve: my criticism is not so much about the requests, which I often find legitimate, but about the uncomfortable ally to which we want to bind ourselves...

Expand full comment

Reply (1)

Continue thread →

Cybernetic Forests

Thoughts About a Mouse