Fining Big Tech Isn't Working. Make Them Give Away Illegally Trained LLMs As Public Domain
Opinion Last year, I wrote a piece here on El Reg about being murdered by ChatGPT as an illustration of the potential harms through the misuse of large language models and other forms of AI.
Since then, I have spoken at events across the globe on the ethical development and use of artificial intelligence – while still waiting for OpenAI to respond to my legal demands in relation to what I've alleged is the unlawful processing of my personal data in the training of their GPT models.
In my earlier article, and my cease-and-desist letter to OpenAI, I stated that such models should be deleted.
Essentially, global technology corporations have decided, rightly or wrongly, the law can be ignored in their pursuit of wealth and power.
Household names and startups have, and still are, scraping the internet and media to train their models, typically without paying for it and while arguing they are doing nothing wrong. Unsurprisingly, a number of them have been fined or are settling out of court after being accused of breaking rules covering not just copyright but also online safety, privacy, and data protection. Big Tech has brought private litigation and watchdog scrutiny upon it, and potentially engendered new laws to fill in any regulatory gaps.
But for them, it's just a cost of business.
Another way forward
There's a principle in the legal world, in America at least, known as the "fruit of the poisonous tree," in which evidence is inadmissible if it was illegally obtained, simply put. That evidence cannot be used to an advantage. A similar line of thinking could apply to AI systems; illegally built LLMs perhaps ought to be deleted.
Machine-learning companies are harvesting fruit from their poisonous trees, gorging themselves on those fruits, getting fat from them, and using their seeds to plant yet more poisonous trees.
After careful consideration over the time between my previous piece here on El Reg and now, I have come to a different opinion with regards to the deletion of these fruits, however. Not because I believe I was wrong, but because of moral and ethical considerations due to the potential environmental impact.
Research by RISE, a Swedish state owned research institute, states that OpenAI’s GPT-4 was trained with 1.7 trillion parameters using 13 trillion tokens, using 25,000 NVidia A100 GPUs costing $100 million and taking 100 days and using a whopping 50GWh of energy. That is a lot of energy; it’s roughly the equivalent power use of 4,500 homes over the same period.
From a carbon emissions perspective, RICE state that such training (if trained in northern Sweden’s more environmentally friendly datacenters) is the equivalent of driving an average combustion-engine car around the Earth 300 times; if trained elsewhere, such as Germany, that impact increases 30 fold. And that's just one LLM version.
In light of this information, I am forced to reconcile the ethical impact on the environment should such models be deleted under the "fruit of the poisonous tree" doctrine, and it is not something that can be reconciled as the environmental cost is too significant, in my view.
So what can we do to ensure those who scrape the Web for commercial gain (in the case of training AI models) do not profit, do not gain an economic advantage, from such controversial activities? And furthermore, if disgorgement (through deletion) is not viable due to the consideration given above, how can we incentivize companies to treat people’s privacy and creative work with respect as well as being in line with the law when developing products and services?
After all, if there is no meaningful consequence – as stated, today's monetary penalties are merely line items for these companies, which have more wealth than some nations, and as such are ineffectual as a deterrent – we will continue to see this behavior repeated ad infinitum which simply maintains the status quo and makes a mockery of the rule of law.
Get their attention
It seems to me the only obvious solution here is to remove these models from the control of executives and put them into the public domain. Given they were trained on our data, it makes sense that it should be public commons – that way we all benefit from the processing of our data and the companies, particularly those found to have broken the law, see no benefit. The balance is returned, and we have a meaningful deterrent against those who seek to ignore their obligations to society.
Under this solution, OpenAI, if found to have broken the law, would be forced to put its GPT models in the public domain and even banned from selling any services related to those models. This would result in a significant cost to OpenAI and its backers, which have spent billions developing these models and associated services. They would face a much higher risk of not being able to recover these costs through revenues, which in turn would force them to do more due diligence with regards to their legal obligations.
If we then extend this model to online platforms that sell their users’ data to companies such as OpenAI - where they are banned from providing such access with the threat of disgorgement - they would also think twice before handing over personal data and intellectual property.
If we remove the ability for organizations to profit from illegal behavior, while also recognizing the ethical issues of destroying the poisonous fruit, we might finally find ourselves in a situation where companies with immense power are forced to comply with their legal obligations simply as a matter of economics.
Companies with immense power are forced to comply with their legal obligations simply as a matter of economics
Of course, such a position is not without its challenges. Some businesses try to wriggle out of fines and other punishment by arguing they have no legal presence in the jurisdictions bringing down the hammer. We would likely see that happen with the proposed approach.
For that purpose we need global cooperation between sovereign states to effectively enforce the law, and this could be done through treaties similar to Mutual Legal Assistance Treaties (MLATs) that exist today.
As for whether current laws have the powers to issue such penalties, that is debatable. Whereas Europe's GDPR, for example, afford data protection authorities general powers to ban processing of personal data (under Article 58(2)(f)) it doesn’t explicitly provide powers to force controllers to put the data into the public domainn. As such, any such effort would be challenged, and such challenges take many years to resolve through the courts, allowing the status quo to remain.
- Microsoft calls AI privacy complaint 'doomsday hyperbole'
- Ellison declares Oracle all-in on AI mass surveillance, says it'll keep everyone in line
- Clearview AI reaches 'creative' settlement with privacy suit plaintiffs: A conditional IOU
- Meta faces multiple complaints in Europe over plans to train AI on user data
However, the new big stick of the EU Commission is the Digital Markets Act (DMA) which has provisions included to allow the commission to extend the scope of DMA. But this would only apply to companies that are under the jurisdiction of the DMA, which is currently limited to just Alphabet, Amazon, Apple, Booking, Bytedance, Meta, and Microsoft.
We cannot allow Big Tech to continue to ignore our fundamental human rights
We cannot allow Big Tech to continue to ignore our fundamental human rights. Had such an approach been taken 25 years ago in relation to privacy and data protection, arguably we would not have the situation we have to today, where some platforms routinely ignore their legal obligations at the detriment of society.
Legislators did not understand the impact of weak laws or weak enforcement 25 years ago, but we have enough hindsight now to ensure we don’t make the same mistakes moving forward. The time to regulate unlawful AI training is now, and we must learn from mistakes past to ensure that we provide effective deterrents and consequences to such ubiquitous law breaking in the future.
As such, I will be dedicating much of my lobbying time in Brussels moving forward, pushing this approach with a hope to amended or pass new legislation to grant such powers, because it is clear that without appropriate penalties to act as a deterrence, these companies will not self regulate or comply with their legal obligations, where the profits for unlawful business practices, far outweigh the consequences. ®
Alexander Hanff is a computer scientist and leading privacy technologist who helped develop Europe's GDPR and ePrivacy rules.
From Chip War To Cloud War: The Next Frontier In Global Tech Competition
The global chip war, characterized by intense competition among nations and corporations for supremacy in semiconductor ... Read more
The High Stakes Of Tech Regulation: Security Risks And Market Dynamics
The influence of tech giants in the global economy continues to grow, raising crucial questions about how to balance sec... Read more
The Tyranny Of Instagram Interiors: Why It's Time To Break Free From Algorithm-Driven Aesthetics
Instagram has become a dominant force in shaping interior design trends, offering a seemingly endless stream of inspirat... Read more
The Data Crunch In AI: Strategies For Sustainability
Exploring solutions to the imminent exhaustion of internet data for AI training.As the artificial intelligence (AI) indu... Read more
Google Abandons Four-Year Effort To Remove Cookies From Chrome Browser
After four years of dedicated effort, Google has decided to abandon its plan to remove third-party cookies from its Chro... Read more
LinkedIn Embraces AI And Gamification To Drive User Engagement And Revenue
In an effort to tackle slowing revenue growth and enhance user engagement, LinkedIn is turning to artificial intelligenc... Read more