Navigating the AI Storm: New York Times vs. Microsoft & OpenAI – The Inevitable Clash of Copyright and Artificial Intelligence
- Natasha Karanja |
- April 5, 2024 |
- Artificial Intelligence,
- Copyright
Introduction
In the ever-evolving landscape of Artificial Intelligence (AI), the recent clash between The New York Times and tech giants Microsoft Corp and OpenAI has stirred discourse around the intersection of intellectual property rights (copyright), sending shockwaves through the realms of copyright and cutting-edge technology. The case forms part of the new emerging list of AI-Copyright lawsuits that have been filed against OpenAI and Microsoft Corp. The list includes but is not limited to: Tremblay v OpenAI (2024), Silverman et al v Open AI Inc. (2023), Doe et al v GitHub Inc. (2023). The lawsuit illuminates the contentious issues arising from the use of large-language models (LLMs), particularly the GPT-4 model, in the development of generative AI tools, such as ChatGPT.1
At the heart of the matter lies the accusation by The New York Times of Microsoft and OpenAI ‘s alleged copyright infringement. The claim centers around the utilization of LLMs, alleging that these models copied “millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to-guides, and more.” 2 This ties in with the need of generative AI to consume large amounts of data as it learns in a manner like human learning; however, within the AI context, “learning experience equates to data.”3
The Times seeks substantial damages, a permanent injunction, and the drastic measure of destroying OpenAI’s Generative Pre-training Transformer (GPT) models and associated training data that incorporates copyrighted material from The Times.4 This legal battle takes on added significance due to The Times’ tendency of pursuing matters to the United States Supreme Court in defense of journalistic expression. The potential repercussions of this case extend beyond mere financial consequences for Microsoft and OpenAI. The stakes involve a potential reevaluation of Microsoft’s investment in OpenAI and the destruction of valuable intellectual property integral to the GPT models.
The Clash of Titans: The Legal Battle Unveiled
The Times’ argument lies in the alleged copying of copyrighted material, spanning news articles, investigations, opinions, reviews, and various other content categories. The lawsuit emphasizes the significant reproduction of The Times’ material by the GPT-4 LLM, specifically generating near-verbatim copies upon specific prompts. The existing methodology of acquiring such data lies within the technique of Common Crawl. This involves the process of systematically borrowing data from the worldwide web to assist with training the language models. Specifically in the development of LLMs by Microsoft and OpenAI, the phenomenon of memorization emerges, where models repeat portions of their training data when prompted appropriately.5This behavior raises concerns about potential copyright infringement, as the models may reproduce text without authorization.
The case takes a unique turn as The Times seeks not only substantial damages but also a permanent injunction and the destruction of OpenAI’s GPT models and associated training data. The latter demand adds a layer of complexity, signaling the potential obliteration of valuable intellectual property. The case becomes not just a legal battle but a high-stakes clash with far-reaching consequences for the AI industry.
The New York Times’ Quest for Protection: A Historical Perspective
The New York Times, known for its steadfast commitment to journalistic expression, has a historical track record of taking matters to the highest court when it deems necessary. Landmark Supreme Court decisions, such as New York Times Co. v. United States (1971) and New York Times Company v. Sullivan (1964), highlight the newspaper’s determination to protect fundamental principles, including freedom of the press and First Amendment rights.
In the current legal battle, The Times underscores the potential threat posed by GPT-powered products to its ability to produce journalistic content. The lawsuit emphasizes the diversion of subscribers using AI-generated content, framing it as an unfair competition that undermines The Times’ market position. The assertion that The Times attempted negotiations with Microsoft and OpenAI before resorting to legal action signals a belief that litigation is the only recourse to safeguard its copyrights.
The Legal Landscape: Four Factors of Fair Use
Central to the legal discourse surrounding the case are the four factors determining fair use, a critical doctrine embedded in Section 107 of the US Copyright Act. These factors are:
-
Purpose and character of the use (transformative use): The extent to which the use is of a commercial nature or for nonprofit educational purposes.
-
Nature of the copyrighted work: The characteristics of the copyrighted material, with some works enjoying broader protection.
-
Amount and substantiality of the portion used: The quantity and significance of the copyrighted material used concerning the work as a whole.
-
Effect of the use on the potential market for or value of the copyrighted work: The impact of the use on the market value of the copyrighted material, considering both direct and indirect effects.
Drawing parallels to relevant legal cases, including Authors Guild v. Google and Andy Warhol Foundation for the Visual Arts v. Goldsmith,6 the analysis delves into the intricacies of fair use in the context of AI-generated content. The focus on transformative use and market substitution emerges as pivotal in determining the legality of GPT-powered products.
Intersection of AI and Intellectual Property: Navigating the Chaotic Crossroads
Large-language models like GPT-4, are designed to learn from vast datasets that include copyrighted material, this introduces a complex dynamic to the realm of copyright law in the digital age. Central to the issues is the challenge of discerning whether the use of AI models, particularly in generating content resembling copyrighted works, falls under fair use. Fair use, a legal doctrine intended to balance the rights of content creators and users, permits limited use of copyrighted material under specific circumstances. However, the application of fair use becomes a nuanced undertaking, especially when dealing with AI-generated content.
While AI models may produce transformative outputs, the concern arises when these outputs closely mimic or reproduce copyrighted works, potentially leading to market substitution and financial harm to the original content creators. Besides copyright, the complaint underscored the importance of addressing trademark issues in AI development. The defendants faced accusations of trademark dilution, with plaintiffs arguing that the potential for AI chatbots to “hallucinate” and misattribute information to the New York Times posed a threat to the integrity of journalism.7 It was claimed that these “hallucinations” misled users regarding the source of the information they were obtaining, causing them to incorrectly believe that the information provided had been vetted and published by The Times.8 This “hallucination” potentially diluted the reputation (accuracy) of the New Times brand and may ‘undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue.’9
The case holds broader implications for the future landscape of intellectual property law, particularly in addressing the evolving challenges posed by advanced AI technologies. Striking a delicate balance between fostering innovation in AI and safeguarding the rights of content creators becomes imperative. Pertinent issues concerning AI and copyright are further explored in CIPIT’s AI and Copyright booklet which provides invaluable insights into navigating the intersection of AI technology and copyright law. As AI continues to advance, legal frameworks must adapt to ensure a fair and equitable landscape for all stakeholders.
Licensing agreements emerge as a critical component in this legal saga. AI developers, including Microsoft and OpenAI, offer licenses for products that incorporate copyrighted material.10 The lawsuit suggests that the demand for licensing from traditional media outlets may diminish if AI-powered technologies can replicate or substitute their content effectively. This shift in dynamics challenges established revenue streams and disrupts the traditional licensing model.11
Conclusion
In conclusion, the intersection of AI and intellectual property introduces complexities that legal systems must navigate adeptly. As AI technologies progress, legal frameworks need to evolve to address the novel challenges posed by AI-generated content. The outcome of cases like the lawsuit against Microsoft and OpenAI could significantly shape the future landscape of AI and its intricate relationship with copyright law. The evolving narrative will undoubtedly unfold in the dynamic and multifaceted intersection of artificial intelligence and intellectual property rights.
Image is from dev.to
1Case No. 1:23-cv-11195 (S.D.N.Y. Dec. 27, 2023)
2 ibid.
3 Arni v & Jayachandran J, Traversing the Ethical Landscape of Data Scraping for AI (University of Maryland 2023).
4 ibid.
5 Case No. 1:23-cv-11195 (n1) …80.
6 Case no 13-4829-cv(2d Cir 2015) and 21-869 [2023].
7 Michael Finney, Hallucinating AI may dilute trademark value (2 January 2024) < https://www.bennettphilp.com.au/blog/hallucinating-ai-dilute-trade-mark-value> accessed 12th February 2024.
8 ibid.
9 Case No. 1:23-cv-11195 (S.D.N.Y. Dec. 27, 2023) paragraph 5.
10 Authors alliance, licensing research content via agreements that authorize use of Artificial Intelligence [2024]f<https://www.authorsalliance.org/2024/01/10/licensing-research-content-via-agreements-that-authorize-uses-of-artificial-intelligence/> last accessed 12th February 2024.
11 ibid.