Generative AI and the Copyright of works used for training

Legal challenges in the rise of Generative AI: copyrights, varied legislation, and proposals to balance innovation and compensation for creators

By Leandro Bissoli, DPO e Ana Piergallini, partner and lawyer at Peck Advogados respectively

The rise in the use of Chat GPT, generative AI from OpenAI, is remarkable. The company, considered the most disruptive in Silicon Valley¹, announced the update of its database, expanding it beyond 2021². This expansion implies the use of an even greater volume of data, including intellectual works protected by copyright, published up to the date of the update.

As OpenAI improves its algorithms and natural language models, it is facing an increase in legal disputes related to possible infringement of copyright of the works used in training these models, as in the recent class action filed by the Authors Guild³. The heart of the dispute lies in the ability of these AI models to produce results very close to protected works, raising legitimate concerns about unauthorized reproductions and their impacts on the regular exploitation of original works.

Understanding how language models work, especially the so-called Large Language Models (LLM), which are designed to perform natural language processing tasks, as well as the GPT (Generative Pre-Training Transformer) architecture, is fundamental to understanding these disputes . The ability of these models to deliver high-quality results is directly linked to the significant volume of data used in their training.

Legislation governing the use of protected works vary from one jurisdiction to another. Japan, for example, whose copyright legislation was updated in 2009, allows the use of protected works to train AI models, with the aim of improving machine learning. In the United States, where several legal disputes are underway, the fair use exception has been applied. In Brazil, PL nº 2338/2023, currently being processed in the National Congress, proposes a hypothesis of limitation in art. 42, with the aim of encouraging scientific research in the public interest and innovation. The exception is justified, as long as the use is not intended for expressive purposes and complies with the requirements of international conventions to which Brazil is a party.

The application of concept of “fair use” seems to be an appropriate approach in the context, as long as some reservations are observed, such as: (i) guaranteeing the moral rights of authors, with emphasis on the right of paternity – due credit must be attributed to the original author – a challenge when dealing with a vast set of input data; and, (ii) avoid generating results that substantially resemble the original works or that emulate the creation of a specific author – the fine line between legitimate use and counterfeiting or plagiarism.

Furthermore, the duty of transparency established by European regulation and national bill. The use of content generated by AI must be accompanied by warnings about automated generation, enabling users to make informed decisions about their consumption.

A relevant solution to reconcile the interests involved is the implementation of a collective management model similar to that used by streaming platforms. As Nicola Lucchi argues in a recent scientific article published in Cambridge Magazine⁴, agreements for the shared use of data can be used as a reference to resolve the use of protected works in AI training, while ensuring compliance with copyright laws as well as ensuring rights to authors, supported by the necessary licenses. Additionally, creating compensation programs, such as revenue sharing or royalty payments, can work to compensate the authors of protected works copyright used for AI training.

While such discussions are being held, bigtechs seek to comfort their users with safeguards to encourage safe use of your AI solutions. Google, for example, has just announced its commitment to indemnifying users in cases of third-party claims, especially copyright claims, reinforcing that it follows responsible AI practices.⁵

Therefore, we have a lot to evolve, especially with regard to the processing of input data. The results of the legal actions taken against OpenAI and other generative AI developers will certainly contribute to the ongoing debate and help the community establish clear boundaries between what is inspiration and what constitutes violation.

The creation of such parameters becomes essential to promote the balance between the copyright system and technological advancement, while ensuring that content creators are fairly compensated for their work.

Source: Editorial Analysis

Tags: digital right IA Generative AI innovation tech

Generative AI and the Copyright of works used for training

Leave a comment Cancel reply