With the introduction of EU AI law, companies training AI models will need to be careful about the datasets they use: certain copyrighted works may no longer be able to be used to train AI models.
introduction
There are three main aspects of generative AI applications that are relevant to copyright protection: (1) machine learning using protected works (the input side), (2) the protectability of works created with the assistance of generative AI (the debate over copyright protection of AI-generated works), and (3) potential infringement by the output of pre-existing works (the output side). This GT Alert focuses on legal issues related to the input side.
On the input side, a key concern is whether publicly available copyrighted works could be used to train AI models used by commercial companies.
Since AI is designed to mirror human intelligence, the input side of AI can be likened to a person reading a book or listening to music. The process of gaining knowledge and inspiration from reading a book or listening to music is seamless and automatic. There is no law that prohibits copying directly or indirectly from the respective books or music and using the increased knowledge and inspiration for commercial purposes.
Compared to the natural way humans acquire knowledge, the people who train AI models have more control over whether and how the AI learns from the data they feed it. Another difference between AI and human intelligence is AI’s ability to build using seemingly unlimited amounts of data. As a result, AI has the potential to exponentially accelerate technological processes and innovation.
As a result, a new question surrounding AI regulation is the extent to which machine learning should be restricted in order to respect intellectual property rights.
Machine Learning Based on EU AI Law
The EU AI Law, approved by the EU Council on 21 May 2024, is the first attempt to answer this question. The EU AI Law includes provisions that equate “text and data mining” (TDM) with AI/machine learning under the EU Text and Data Mining Directive.1 Thus, “machine learning” is:
that the person programming the machine learning functionality lawfully accessed the content for the purposes of text and data extraction; and
the copyright and related rights holders and/or database owners have not expressly reserved the extraction of texts and data (so-called opt-out mechanisms).
The EU AI Law is due to come into force in 2024 and be fully implemented 24 months after that. However, a TDM exception under the EU Text and Data Mining Directive already exists, so the TDM exception for machine learning could already be put into place in anticipation of the interpretation of the EU AI Law.
Opt-out Mechanism
It’s not yet clear what a legally valid opt-out request would look like, but various organizations, including Dutch copyright collector Pictoright (photography) and French copyright collector Sacem (music), have drafted general reservation of rights statements that would allow creators to opt out of having their data used to train AI models. Additionally, many websites and social media images now feature similar opt-out statements.
There is no case law or other authoritative documentation determining whether such statements are sufficient to trigger an opt-out threshold, but this tendency is likely to intensify now that EU AI Law has been adopted.
summary
Although EU AI law and its TDM exception have not yet been formally applied, AI system providers and developers should consider implementing measures and configurations to avoid infringement claims by rights holders. Here are four further things to consider:
Obtain legitimate access to content: The process of reviewing web scraping or pre-built datasets checks whether the content used for machine learning purposes is subject to access restrictions, such as paywalls or other (technical) restrictions.
Check for the existence of opt-out reservations. Consider making sure that rights holders have not reserved the right to make reproductions for TDM purposes, for example by searching the websites of collective rights organizations to see which works contain opt-out criteria.
Include necessary contractual protections. With regard to machine learning, two types of contracts are particularly relevant: (1) contracts with owners of datasets used to train the AI model, and (2) contracts with customers of the AI model. In either case, consider crafting contracts that provide a fair and balanced allocation of liability for inadvertent uses of opted-out copyrighted works.
Put guardrails around your AI models to prevent use for purposes other than TDM: For any copyrighted content that may be used for TDM purposes, consider implementing technical and organizational restrictions on the use of the content to ensure it is used only for training AI models.
1 Article 52c(1) of the EU AI Act and Article 4 of the EU Text and Data Mining Directive.