top of page
Search

The Legal Consequences of the AI Copyright Wars of 2025

How will major lawsuits and rulings over AI training data determine the future balance between creators’ rights and artificial intelligence development?

Published November 17th, 2025

Written by Prisha Poddar


Artificial intelligence has ignited a new battle over the use of copyright laws since people have begun to train computers through works of human creativity. In 2025, courts are testing whether it is an infringement on copyright or fair use to teach large language models on copyrighted materials without authorization under US law. These decisions will determine how creators and developers will share space and content in the digital age and may set the tone for decades of AI development.


The foundation of the argument is 17 U.S.C. § 107, which makes provision for limited uses of copyrighted work without permission on certain conditions. The statute permits uses “for purposes such as criticism, comment, news reporting, teaching, scholarship, or research” if courts weigh four factors. Courts must consider the purpose and character of use, the nature of the copyrighted work, the amount taken and the effect on the market. None of these grounds alone determines fairness, and judges must examine them together in light of modern technology and digital habits.


In The New York Times Company v. Microsoft Corporation and OpenAI, Inc. (S.D.N.Y. 2023, ongoing), the Times claimed that OpenAI and Microsoft used its content as training data and duplicated it in ChatGPT outputs without permission. In April 2025, Judge Sidney Stein declined to dismiss key claims, noting that factual disputes over copying and market impact must proceed to trial, according to Reuters. The court also rejected OpenAI’s safe harbor argument tied to the statute of limitations. This argument attempted to treat earlier acts of copying as time-barred, but the court held that the passage of time did not automatically shield large-scale or continuous training activity from review. By allowing the lawsuit to continue, the ruling signals that courts are willing to scrutinize AI training practices closely and may impose liability for widespread ingestion of copyrighted material.


The same court issued a sweeping preservation order in May 2025, mandating OpenAI to retain all ChatGPT conversation histories. Legal analysts note that the record-breaking order pushes discovery into new and untested realms of data privacy and governance. OpenAI is protesting, noting that such retention is incompatible with privacy commitments. The lawsuit may determine if user data logs can be subpoenaed in AI copyright infringement cases and may influence digital platforms’ privacy requirements.


In another jurisdiction, Bartz v. Anthropic (N.D. Cal. 2024, decided 2025) tested fair use for AI training. Judge William Alsup ruled that books purchased legally could be fair use, but copying and storing millions of pirated books was not. He reaffirmed that previously recalled pirated material remains a separate issue for adjudication, according to Reuters. The ruling illustrates how § 107’s four factors — purpose, amount, substantiality and market effect — operate differently in AI scenarios. Alsup further noted the contrast in scale between AI activity and human reading and writing. Billions of words ingested by LLMs can create a disproportionate effect on markets for original works. This disproportionate effect refers to the way large-scale ingestion of text allows a model to participate in multiple downstream markets such as journalism, educational materials, marketing copy, genre fiction and reference content. The model can generate outputs that function as partial substitutes for these works at almost no marginal cost. Though any individual book or article represents only a small part of the training corpus, the aggregated corpus gives the model broad expressive and functional capabilities that reduce demand for certain types of original writing. The scale of training, therefore, magnifies market impact far beyond what would occur in a traditional fair use analysis focused on a single work.


Later in 2025, Anthropic agreed to a $1.5 billion settlement with thousands of authors, covering over 500,000 books. Anthropic is a major artificial intelligence company that develops large language models similar to OpenAI’s, with its systems trained on vast text corpora to generate and analyze human language. Because this training process can involve copyrighted books and other written works, Anthropic faces the same legal challenges as other AI developers regarding how training data is acquired and whether it is used with permission. The payout, approximately $3,000 per book, is one of the largest US copyright settlements on record, according to Reuters. Some provisions remain under negotiation, such as how responsibility would be divided among AI creators and publishers, according to BHFS. The settlement sets a financial benchmark that may lead other AI developers to negotiate licenses proactively, assess risk more carefully and adjust bargaining strategies to avoid costly litigation.


Courts are also considering model memorization. Model memorization is the tendency of a machine learning model to store and reproduce specific examples from its training data rather than generalizing patterns. A 2025 technical report, Extracting Memorized Training Data from LLMs, shows some models can reproduce large portions of copyrighted text, potentially running afoul of market effect and substantiality requirements under § 107, according to arXiv. That study outlined how AI models will unintentionally memorize and copy expressive content rather than abstract patterns, putting them even more at risk for liability.


Copyright wars have also extended to images. Disney sued AI image generators Minimax and Midjourney in 2025 for allegedly mirroring output from copyrighted works. Disney argues that output itself can trigger liability even where training material would have been legitimate, according to Bloomberg Law. Music rights have also been in dispute. Spotify complained that AI firms copied artist recordings without permission and partnered with major labels to develop compliant AI tools, according to Business Insider. These cases illustrate that as AI systems expand across creative domains, courts and rights holders are increasingly scrutinizing both the use of copyrighted material in training and the outputs produced, highlighting the broad legal and financial stakes for developers.


Regulators are moving too. In 2025, the U.S. Copyright Office issued guidance stating that human authorship is required for copyright protection. AI can contribute to the creative process, but machine-only works are not eligible. That is reflected in Thaler v. Perlmutter, when a federal court upheld that AI cannot register copyrights, according to Skadden. This decision underscores that AI developers cannot rely on copyright protection for machine-generated works and that businesses must carefully navigate authorship rules when deploying AI in creative industries.


All these developments indicate that 2025 is the year when AI copyright law turns the corner. The New York Times lawsuit stretches § 107 to its breaking point in the magnitude of the legal challenge posed by the case. Bartz v. Anthropic clarifies distinctions between legal and illegal uses of data. Disney and Spotify extend the debate to images and sound. The Copyright Office and courts reestablish human authorship as a statutory floor.


If courts rule in favor of developers, AI firms can avoid licensing but sabotage creator control. If courts rule in favor of creators, companies can demand mass-scale licenses or technical safeguards to prevent memorization. Either scenario, 2025 confirms that AI copyright disputes are no longer hypothetical — they’re real, high-stakes, and shaping law, creativity and innovation in real time.



ree

 
 
 
bottom of page