9 Comments

I believe that copyright not only is insufficient to improve the situation, but that it will greatly worsen it. It does not stop being a tool at the service of companies just because it is now inconvenient for some big tech. There are other large corporations for which it is convenient, so the game remains at the upper levels.

The idea of appealing to an even stricter copyright is harmful for practical reasons. A realistic financial compensation for those who produce images or texts is indeed unthinkable. Since billions of images or texts are needed to develop these technologies, the contribution of individuals is minimal and it's unimaginable to think of paying them even just one euro each, as it would make the already high initial investment impossible. Data would therefore be paid a very little, and only those who already own it in large quantities would benefit – for example, Shutterstock is paying creators something like 0.01 per image used for training.

The alternative is therefore that everyone refuses to provide their data, making these technologies much less effective (such as Adobe Firefly, which uses only proprietary images). Even if we consider this scenario commendable or plausible, it presents serious criticalities.

The first point is that artists who are not already well-established have no bargaining power. Just as they accept relatively low royalties and minimal or non-existent advances, they will also accept a training clause, where necessary. When considering laws, their practical effectiveness, not just their theoretical basis, should be evaluated.

The second point is that even these 'weaker' AIs have a significant impact on the creative market, and would only be considered harmless for the most original works – the only ones that, in my opinion, are not at risk anyway. The bulk of the creative market and the related earnings of those working in it are not in works of art, but in commercial works made according to client requests: all things that AIs built on proprietary datasets can do very well. Whatever the impact on work, it will be identical, but it will prevent artists without deep pockets from using AIs to their full potential because they will always have to rely on proprietary and unmodifiable software.

The third point is the impact on open-source technology. Currently, it's possible to customize open-source technologies with fine-tuning, but only if the use of data for training remains free. A paid dataset would also further reduce the variety of companies producing these technologies, due to the additional economic barrier. Like building these machines, or producing computers, cameras, and thousands of other common tools, significant investments are needed, which only private or public companies can afford. The higher the investment ceiling, the fewer companies can afford it.

The fourth point is a geopolitical reason. If we do not consider the use of data fair, we must take into account that in Japan it already is. If we do not consider AI-generated images as a human creative product, we must consider that in China they are legally regarded as such. On the other hand, we have various publishing giants demanding their share of profit, like the New York Times recently suing OpenAI. We're not talking about small authors with a book in the dataset, an utterly insignificant contribution compared to the size needed to train an LLM, but about publishing giants that can claim substantial economic demands from the mountain of rights they possess. In the West, I believe the direction will be the usual individualistic one, data as private property. But we do not live in a world of perpetual peace where all states can agree on such matters, but in a competitive context. The East has a more collectivist view and could consider data as a public good.

Then there's a social reason, which for me is the most important. The cognitive world of AIs is determined by data, and that's exactly why I want them to have mine as well as everyone else's, not just that decided on and/or purchased by companies. Otherwise, we will be even more limited and harmed by the cultural biases of AIs, in addition to our own. It's vital that the dataset be open and modifiable.

To conclude, if training were restricted by copyright, we would face the unprecedented scenario where pirate, Eastern or non-profit AIs would be more powerful than Western commercial ones. Would we really use the latter?

Expand full comment