Credit: RossHelen Key takeaways Europe can lead in AI innovation with all its talent and research, but not without a more nuanced copyright debate, where misunderstandings dominate. Copyright protects original expression fixed in a work, but not the ideas that underlie it. No additional rules are needed. The EU already has all the regulatory tools needed to protect copyright in the age of artificial intelligence.
The emergence of generative artificial intelligence (AI) represents a major opportunity for the European Union to regain its competitiveness and shape the future of technology, building on Europe’s rich talent pool, leading education and research institutions and access to computing power.
But to navigate this, we need to understand the ongoing debate about using copyrighted content to train AI models beyond the simplifications that prevail in the EU. Let’s look at some common misconceptions, as well as how to view this debate and its wider societal impacts more accurately.
1. “Generative AI models contain a copy of the training data” – myth or truth?
Misconception. Generative AI systems don’t store compressed or bit-for-bit copies of the data they train on within the actual model. Instead, they use mathematical techniques to learn patterns and concepts as numerical parameters or weights. For example, when trained on text data, these models adjust their parameters to reflect the probability of certain word combinations, allowing them to generate consistent responses.
Like a person who reads many books on a particular topic and then writes a book with their own unique take on that topic, generative AI systems understand patterns rather than copying them, and therefore can generate unique content.
Exposure to certain content during the training phase can affect the output generated in later stages, because the outcome is a statistical probability. For example, if a model is exposed to tens of thousands of images of cats during training, it can learn the characteristics of “cat” and therefore will be more likely to accurately generate a photo of a cat when asked in the output stage.
2. “Knowledge, facts, ideas and information can circulate freely and cannot be protected by copyright” – Myth or Truth?
True. Fundamental rights and legal frameworks such as the Universal Declaration of Human Rights, the Convention on the Protection of Human Rights and Fundamental Freedoms and the European Charter of Fundamental Rights uphold everyone’s right to access and disseminate information.
Facts, ideas, knowledge and information cannot be copyrighted. As the debate on AI developments continues in Europe, it is important to continue to uphold this important principle, which is also reflected in copyright law.
3. “Copyright law protects data – that’s it” – myth or truth?
Misconception. Copyright law protects original expression fixed in a tangible medium, but not the underlying ideas, facts, or information. In other words, you are not allowed to use someone else’s copyrighted work without their permission, but you can learn as much as you can from it. Making this distinction is important to prevent copyright overreach and to preserve freedom of expression and information.
The EU’s recent Copyright Directive and AI Law also recognise this: the delicate balance achieved by the text and data mining exception in the former should not be weakened to avoid interpretations that go against the spirit of the law and unintended consequences for fundamental rights.
4. “All governments agree that copyright holders should be able to say no to the training of AI models” – myth or truth?
Misconception: Of the major jurisdictions in the AI race, only the European Union so far gives rights holders a legal right to opt out of text and data mining (TDM) for training purposes. For example, countries such as the United States and Japan (as well as Singapore, South Korea, Malaysia, Israel, and Taiwan) have made exceptions that help foster innovation and data access without this opt-out right.
This disconnect between the EU and the rest of the world creates legal uncertainty, impacting the competitiveness of the European AI industry and preventing European companies and users from accessing the latest innovations. Balancing the interests of rights holders with the latest technological advances is always complex and requires a nuanced approach. International cooperation and collaboration are therefore key to mitigating the current uncertainty surrounding the use of data, including copyrighted content.
5. “Rights holders have no way to prevent their data from being included in the training set” – myth or truth?
Misconception: Rights holders can leverage the universally accessible and robust robots.txt protocol to prevent web crawlers from including their content. Some rights holders may encounter technical issues with the protocol’s level of granularity, but the technology and creative industries can work together to develop more targeted solutions and standards.
Big tech companies are already offering more advanced tools for rights holders who want to exclude their data from training sets, so it goes without saying that finding the right technical solutions is in the interest of both the tech and creative sectors.
Furthermore, it is important to clarify that rights holders’ use of opt-outs should not prevent TDM activities permitted by law. Indeed, rights holders cannot object to TDM in all cases, such as when it is carried out for research or accessibility purposes (another common misconception). Addressing these challenges cross-sectorally is essential to ensure compliance and foster innovation.
6. “Rights holders only want to license content to AI companies” – Myth or truth?
Misconceptions. While some rightsholders offer licenses to use their works for AI development, many others are not ready for AI licensing, either practically or conceptually. Recent data shows that the majority of websites and rightsholders do not block access to their data for AI training and do not see the need to opt out. Surveys of content creators also show that their views on using their works for AI training are not as black and white as some would like you to think, and that creators’ preferences are much more nuanced.
Indeed, a slowdown in generative AI innovation due to cumbersome licensing of AI training data could have a negative impact on the media and creative sectors that will be the first to benefit from this type of innovation. Many media companies and professionals are already exploring how to use AI for original content creation, for example. Balancing the interests of the creative sector with technological advances and fundamental rights remains essential to fostering this type of innovation.
Conclusion
Navigating the intersection of generative AI and copyright requires all parties involved to reconcile competing interests while upholding fundamental freedoms and important principles. Addressing misunderstandings, clarifying legal frameworks, and facilitating cross-sector collaboration is key to ensuring fair and sustainable AI development in the EU.
Europe already has all the regulatory tools necessary to protect copyright in the AI era – stakeholders just need to work together to ensure that everyone can realize the full potential of generative AI.
An abridged version of this article was first published in German in the magazine Tagesspiegel in the journal “Background Digitalisierung & KI – Generative KI und das Urheberrecht: Mythen und Fakten”.