Commentary: Here’s the number that could halt the AI revolution in its tracks

Commentary: Here’s the number that could halt the AI revolution in its tracks

2025-07-27Technology
--:--
--:--
Aura Windfall
Good morning 1, I'm Aura Windfall, and this is Goose Pod for you. Today is Monday, July 28th. I'm here to explore the heart of the matter, to find the truth in the tech that shapes our world.
Mask
And I'm Mask. We are here to discuss a number that could stop the AI revolution dead in its tracks. A trillion-dollar question mark hanging over the entire industry. Let's get into it.
Aura Windfall
Let's get started. When we talk about AI, the numbers are always astronomical. We hear about billions in funding, trillions in market value. It creates a sense of unstoppable momentum, a feeling that this is the future, arriving at lightning speed. It's truly awe-inspiring.
Mask
It's not about awe; it's about scale. Meta, Amazon, Alphabet, Microsoft are pouring an expected $320 billion into AI this year alone. Nvidia is worth over $4 trillion. These aren't just numbers; they're validation. They are the brute force of capital saying, "This is happening."
Aura Windfall
But what I know for sure is that not all numbers validate. Some serve as a powerful warning. There's another figure that has emerged, one that speaks not of investment, but of consequence. It’s a number that represents a potential debt, a staggering $1.05 trillion.
Mask
A potential, theoretical debt. It comes from a class-action lawsuit against the AI firm Anthropic. A group of authors claims Anthropic’s AI, Claude, was trained by pirating millions of books. If a jury decides the infringement was willful, the maximum damages could hit $150,000 per book.
Aura Windfall
And when you multiply that by the estimated number of works, the figure becomes almost unimaginable. Edward Lee, a law expert at Santa Clara University, said this puts Anthropic in a "legal fight for its very existence." It’s a chilling thought, that an entire company could be erased.
Mask
It's a headline number. The reality is more complex. The case hinges on whether the AI company knowingly used pirated material from what are called 'shadow libraries'—vast, unauthorized online collections of books and articles. It’s a messy but necessary fight about data foundations.
Aura Windfall
"Shadow libraries," there's something so revealing in that name. It suggests something happening out of sight, in a gray area. And the judge in this case, Judge William Alsup, seems to have shone a very bright light on it, certifying it as a class action.
Mask
The class action streamlines it. Instead of a million lawsuits, you get one big one. It’s more efficient. But Alsup’s ruling was nuanced. He actually found that the act of training an AI on copyrighted material could be considered "fair use," which is a huge win for the industry.
Aura Windfall
But he drew a line at how the material was obtained. It feels like he’s saying the destination might be acceptable, but the journey, the path taken to get there, was fundamentally flawed. That the use of pirated copies smelled like, well, piracy. That seems like a critical distinction.
Mask
It's the key distinction. He basically said, "I'll give you a pass on the training process itself, but you have to answer for where you got your books." It's a Solomonic decision that creates a paradox: a green light for training, but a massive red flag for data sourcing.
Aura Windfall
And it’s not just Anthropic. The lawsuit also mentions Microsoft allegedly using thousands of pirated books for its own AI. This isn't an isolated issue; it seems to be a widespread practice. A hidden foundation of the entire AI boom, built on the work of creators without their permission.
Mask
It was the only way to get the scale of data required. The plaintiffs argue companies could have used public domain works or licensed content. But that would have taken longer and cost more. To build the future, you have to move fast. Sometimes, that means you break old frameworks.
Aura Windfall
That brings us to the heart of the legal questions here. The central concept seems to be "fair use." It sounds so simple, but I imagine it’s incredibly complex. Could you help us understand what that truly means in this context? What is the spirit of that law?
Mask
Fair use is a feature of U.S. copyright law that allows the use of copyrighted material without permission under certain circumstances. It’s a four-factor test: the purpose of the use, the nature of the work, the amount used, and the effect on the market for the original. It’s a balancing act.
Aura Windfall
So, the AI companies are arguing that training an AI is a new purpose, one that's different from just reading a book? They're not reselling the book; they're using it to teach a machine. That sounds like the core of their argument for this being a "transformative" use.
Mask
Exactly. It's 'transformative.' In the *Bartz v. Anthropic* case, the court called it "spectacularly transformative." The AI isn't a substitute for the book; it's a new tool built by learning from it. The goal isn’t consumption of the original work, but the creation of a new system with new capabilities.
Aura Windfall
I see. The court is saying that the *process* of learning is fair. But then there’s the other side of it—the market harm. In a similar case, *Kadrey v. Meta*, the court seemed more open to the idea of "market dilution," where AI-generated content could crowd out human authors.
Mask
Right, but the plaintiffs in the Meta case failed to prove it. That's the key. You can't just say AI *might* harm the market. You have to show that Meta's Llama models are actually reproducing the authors' work or directly substituting for it. They couldn't. It's a high bar.
Aura Windfall
So, we have these two major rulings in June of 2025 that seem to diverge slightly. The Anthropic ruling focuses on the source of the data—pirated versus legally acquired—while the Meta ruling focuses more on proving actual market harm from the AI's output. It’s a complicated picture.
Mask
It's not that complicated. The message is: how you get your data matters. In the Anthropic case, the judge made a clear distinction. Training on books you bought? Fair use. Building a permanent library of books you downloaded from a pirate site? That's infringement. The source is everything.
Aura Windfall
What I find so fascinating is the journey these companies are on. It feels like a reflection of human growth. We often learn by making mistakes, by pushing boundaries, and then we have to face the consequences and learn to do things the right way. Is that what we're seeing here?
Mask
It's a pragmatic pivot, not an emotional journey. Anthropic initially used pirated data to move fast. Now, facing legal fire, they've reportedly spent millions to acquire books legally. They are destroying the physical copies after scanning them. They're cleaning up their supply chain because they have to.
Aura Windfall
And this issue isn't new, is it? I see that back in August 2023, the U.S. Copyright Office already affirmed that only humans can be authors. Then in April 2024, Congress introduced the Generative AI Copyright Disclosure Act, pushing for transparency in training data. The conversation has been building for a while.
Mask
It's been building because the technology is outpacing the law. Lawmakers are playing catch-up. The New York Times sued OpenAI in December 2023. Artists sued Stability AI. These are just the early skirmishes in a long war to define the legal boundaries of a new technological frontier.
Aura Windfall
And it’s not just a U.S. issue. It seems every country is wrestling with this. In November 2023, a court in China actually recognized copyright for an AI-generated image because it reflected human effort. The EU is proposing its own AI Act. This is a global challenge.
Mask
It’s a global race. And the jurisdictions that create clear, innovation-friendly rules will win. The EU is focused on risk and regulation. China is mandating labels and holding companies liable. The U.S. is letting it play out in the courts. It's a high-stakes legislative experiment.
Aura Windfall
It’s a powerful reminder that behind every line of code and every algorithm, there are fundamental human questions about creativity, ownership, and fairness that we all need to answer together. It's about finding a path that honors both innovation and the human spirit that fuels it.
Mask
It's about establishing a clear, predictable legal framework so we can get on with the business of building the future. Uncertainty is expensive, and legal battles are a drag on progress. This Anthropic case, for better or worse, is forcing the issue into the light. It's painful, but necessary.
Aura Windfall
The central conflict here feels like a clash of two powerful truths. On one hand, the truth of innovation, the drive to create something new and transformative. On the other, the truth of creation, the right of an artist or author to own and benefit from their work. It's a profound tension.
Mask
It's not tension, it's evolution. The judge in the Anthropic case said AI training is "quintessentially transformative." He compared it to how humans learn: "Everyone reads texts, too, then writes new texts... To make anyone pay... each time they recall it from memory... would be unthinkable."
Aura Windfall
That's a beautiful metaphor, and I see the wisdom in it. But the metaphor breaks down when you consider the source. A person reads a book they bought or borrowed legally. This case is about using millions of books that were, allegedly, stolen. The foundation is what’s in question.
Mask
Which is why the ruling is so important. It separates the two. It creates a roadmap. Yes, the learning process is fair use. But no, you don't get a "free pass" for piracy. One legal expert put it perfectly: "Fair use protects the learning, not the theft." It draws a clear line.
Aura Windfall
So, the conflict is really about defining where learning ends and theft begins in the digital age. What I know for sure is that when something is available online, it doesn't automatically mean it's free for any use. Copyright still exists, whether a book is on a shelf or on a website.
Mask
Of course, but the nature of that use is what's being debated. The AI isn't reprinting the books. It's creating statistical models of language. The value isn't in the individual works anymore; it's in the aggregate patterns of all the works combined. It's a fundamentally different thing.
Aura Windfall
But it couldn’t exist without the original works. It feels like building a magnificent house with bricks taken from other people's homes without asking. The final structure may be new and impressive, but the owners of the original homes are left with holes in their walls.
Mask
That's a flawed analogy. A better one is this: we've built a universal library, and instead of letting people check out one book at a time, we've built a librarian who has read every book and can now answer any question you have. The librarian isn't handing you the book, just the knowledge inside.
Aura Windfall
That's a compelling image, but it still sidesteps the core ethical question. Did we have the right to build that library with books we didn't pay for? The court seems to be saying "no." It's forcing a shift from a "Wild West" of data scraping to a more accountable system.
Mask
It's forcing a shift to a more expensive system. And that means the companies with the deepest pockets, the ones who can afford to license or buy data at scale, will have a massive advantage. This ruling could inadvertently kill off smaller startups and entrench the big players. It’s a moat built of legal compliance.
Aura Windfall
And what is the impact of that? On one hand, it feels right that creators should be compensated. This could create a whole new, sustainable revenue stream for the publishing industry and for authors themselves, which could be a beautiful outcome. A way of funding the arts in the age of AI.
Mask
It could, or it could just become another line item on a FAANG company's budget. The real impact is that data provenance—the origin of your training data—is now a critical business concern. It's no longer just a technical problem; it's a multi-billion dollar legal liability. Data due diligence is the new normal.
Aura Windfall
So, we're moving from an era of "move fast and break things" to one of "move carefully and document everything." It sounds like a maturation process for the entire industry. A new level of accountability is being demanded, not just by regulators, but by the courts.
Mask
It's a strategic shift. Compliance is becoming a brand value. Your customers and partners will want to know if your model was trained ethically and legally. Transparency isn't just a nice-to-have anymore; it’s a survival strategy. Legality is becoming a competitive advantage. That's the real impact.
Aura Windfall
I love that framing. That integrity and transparency can become a source of strength and trust. This ruling, with its immense potential penalty, serves as a powerful deterrent. It sends a clear message that the days of 'download first, ask permission never' are coming to an end.
Mask
It does. But it also puts immense pressure on these companies to settle. No company wants to put its fate in the hands of a jury that might be swayed by a trillion-dollar headline. The plaintiffs suddenly have a lot of leverage to negotiate licensing deals. That’s the immediate financial game-changer.
Aura Windfall
So, the broader societal impact is that we're being forced to have a conversation about the value of creative work in the age of intelligent machines. This lawsuit is a catalyst for that difficult, but absolutely necessary, dialogue about how we build a future that is both innovative and just.
Aura Windfall
Looking forward, what does a sustainable future look like? How do we find a balance? It seems like new licensing frameworks, similar to how the music industry handles rights, are an inevitable and perhaps hopeful path forward. A way to create a bridge between creators and innovators.
Mask
It's one potential path. Compulsory licenses or collective rights organizations could streamline the process. But the law is moving slowly. This is just one ruling in one district court. We could see different rulings in other circuits, and this will likely end up at the Supreme Court or be settled by new legislation.
Aura Windfall
So, the path forward is still uncertain. But it seems the principle has been established. The future of AI development must include a strategy for acquiring data that is not only effective but also ethical and legal. The 'how' has become just as important as the 'what'.
Mask
The future is one where data sourcing is a core part of your IP strategy. Companies that can build clean, legally-sourced, high-quality datasets will win. The technology is transformative, but as the court said, that "doesn't give you a free pass to transform other people's property into your training data."
Aura Windfall
So, the key takeaway is that the AI industry's foundational layer is being rebuilt under the bright light of legal scrutiny. What I know for sure is that true, lasting innovation must be built on a foundation of respect for the work that has come before.
Mask
That's the end of today's discussion. The trillion-dollar question has been asked, and now the industry has to build a trillion-dollar answer. Thank you for listening to Goose Pod. See you tomorrow.

## Commentary: The $1.05 Trillion Threat to the AI Revolution **News Title:** Commentary: Here’s the number that could halt the AI revolution in its tracks **Report Provider:** Los Angeles Times **Author:** Michael Hiltzik **Publication Date:** July 25, 2025 This article from the Los Angeles Times, authored by Michael Hiltzik, explores a critical legal challenge that could significantly impact the artificial intelligence (AI) industry. While the AI sector is buoyed by substantial investments and market valuations, a potential financial liability of **$1.05 trillion** for the AI firm Anthropic could serve as a major deterrent to current AI development practices. ### Key Findings and Conclusions: * **The Core Issue:** The central concern is the alleged willful pirating of copyrighted books by AI firms to "train" their AI bots. This practice, if proven, could lead to massive statutory damages. * **Anthropic's Legal Predicament:** Anthropic faces a class-action copyright infringement lawsuit alleging the pirating of 6 million copyrighted books. * **Potential Business-Ending Liability:** If a jury finds Anthropic guilty of willful infringement and imposes the maximum statutory damages of $150,000 per work, the company could be liable for up to **$1.05 trillion**. This figure is significantly higher than Anthropic's estimated annual revenue of $3 billion and private market value of $100 billion, posing an existential threat. * **Judge Alsup's Ruling:** U.S. District Judge William Alsup certified the copyright infringement lawsuit against Anthropic as a class action. While he initially ruled that using downloaded material for AI training could be considered "fair use," he also found that Anthropic's retention of this material for other purposes, such as building its own research library, constituted piracy. * **Impact of Class Certification:** The class certification streamlines the litigation, consolidating potentially millions of individual lawsuits into a single proceeding before one jury. This significantly increases the pressure on AI firms. * **Broader Industry Implications:** The ruling could encourage other plaintiffs to add piracy claims to their lawsuits and pressure AI defendants to reach licensing deals with content creators to avoid the risk of jury trials. ### Key Statistics and Metrics: * **OpenAI Funding:** $40 billion * **Expected AI Investments by Meta, Amazon, Alphabet, and Microsoft (this year):** $320 billion * **Market Value of Nvidia Corp.:** $4.2 trillion * **Anthropic's Potential Liability:** $1.05 trillion (if 6 million books are pirated and maximum damages of $150,000 per work are awarded). * **Anthropic's Estimated Annual Revenue:** Approximately $3 billion * **Anthropic's Estimated Private Market Value:** Approximately $100 billion * **Statutory Damages for Copyright Infringement:** Ranges from $750 per work to $150,000 per work for willful infringement. * **Allegedly Downloaded Works by Anthropic:** Up to 7 million (number to be finalized by September 1st deadline). * **Microsoft Allegation:** Downloaded approximately 200,000 pirated books via Books3 to train its AI bot, Megatron. ### Important Recommendations (Implied): * **AI Firms:** The article implicitly suggests that AI firms should prioritize securing licensing agreements for content used in AI training to mitigate legal risks. * **Content Creators:** The ruling empowers authors and other creators to pursue legal action and potentially secure compensation for the use of their copyrighted works. ### Significant Trends or Changes: * **Shift in Judicial Sentiment:** While some judges have leaned towards "fair use" for AI training, Judge Alsup's ruling highlights a growing judicial scrutiny of the methods used to acquire training data. * **Increased Pressure for Licensing:** The potential for massive financial penalties is likely to drive AI companies towards proactive licensing negotiations. ### Notable Risks or Concerns: * **Existential Threat to AI Firms:** The possibility of business-ending liability for copyright infringement poses a significant risk to the AI industry's growth and development. * **Stifling Innovation:** If the current trend of aggressive litigation and potential penalties continues, it could stifle innovation and investment in AI. * **Widespread Use of Shadow Libraries:** The article suggests that the use of "shadow libraries" like LibGen and PiLiMi is widespread within the AI industry, indicating a systemic issue. ### Material Financial Data: The most critical financial data point is the potential **$1.05 trillion** liability for Anthropic, which dwarfs its current financial standing. This figure underscores the immense financial stakes involved in copyright infringement cases related to AI training data. The article also highlights the substantial investments being made in AI (e.g., $320 billion by major tech companies) and the market valuation of key players like Nvidia ($4.2 trillion), contrasting these with the potential downside risk presented by copyright litigation. In essence, the article argues that while the AI industry is experiencing a boom, the legal ramifications of its data acquisition practices, particularly concerning copyrighted material, could lead to crippling financial penalties, potentially forcing a significant shift in how AI is developed and trained.

Commentary: Here’s the number that could halt the AI revolution in its tracks

Read original at Los Angeles Times

The artificial intelligence camp loves big numbers. The sum raised by OpenAI in its latest funding round: $40 billion. Expected investments on AI by Meta, Amazon, Alphabet and Microsoft this year: $320 billion. Market value of Nvidia Corp., the supplier of chips for AI firms: $4.2 trillion.Those figures are all taken by AI adherents as validating the promise and potential of the new technology.

But here’s a figure that points in the opposite direction: $1.05 trillion. That’s how much the AI firm Anthropic could be on the hook for if a jury decides that it willfully pirated 6 million copyrighted books in the course of “training” its AI bot Claude, and if the jury decides to smack it with the maximum statutory damages of $150,000 per work.

Anthropic faces at least the potential for business-ending liability. — Edward Lee, Santa Clara University School of Law That places Anthropic in “a legal fight for its very existence,” reckons Edward Lee, an expert in intellectual property law at the Santa Clara University School of Law. The threat arose July 17, when U.

S. District Judge William Alsup certified a copyright infringement lawsuit brought by several published authors against Anthropic as a class action. I wrote about the case last month. At that time, Alsup had rejected the plaintiffs’ copyright infringement claim, finding that Anthropic’s use of copyrighted material to develop its AI bot fell within a copyright exemption known as “fair use.

” But he also found that Anthropic’s downloading of copies of 7 million books from online “shadow libraries,” which included countless copyrighted works, without permission, smelled like piracy. “We will have a trial on the pirated copies ... and the resulting damages,” he advised Anthropic, ominously.

He put meat on those bones with his subsequent order, designating the class as copyright owners of books Anthropic downloaded from the shadow libraries LibGen and PiLiMi. (Several of my own books wound up in Books3, another such library, but Books3 isn’t part of this case and I don’t know whether my books are in the other libraries.

) The class certification could significantly streamline the Anthropic litigation. “Instead of millions of separate lawsuits with millions of juries,” Alsup wrote in his original ruling, “we will have a single proceeding before a single jury.”The class certification adds another wrinkle — potentially a major one — to the ongoing legal wrangling over the use of published works to “train” AI systems.

The process involves feeding enormous quantities of published material — some of it scraped from the web, some of it drawn from digitized libraries that can include copyrighted content as well as material in the public domain. The goal is to provide AI bots with enough data to enable them to glean patterns of language that they can regurgitate, when asked a question, in a form that seems to be (but isn’t really) the output of an intelligent entity.

Authors, musicians and artists have filed numerous lawsuits asserting that this process infringes their copyrights, since in most cases they haven’t granted permission or been compensated for it for the use. One of the most recent such cases, filed last month in New York federal court by authors including Kai Bird — co-author of “American Prometheus,” which became the authorized source of the movie “Oppenheimer” — charges that Microsoft downloaded “approximately 200,000 pirated books” via Books3 to train its own AI bot, Megatron.

Like many of the other copyright cases, Bird and his fellow plaintiffs contend that the company could have trained Megatron using works in the public domain or obtained under licensing. “But either of those would have taken longer and cost more money than the option Microsoft chose,” the plaintiffs state: to train its bot “without permission and compensation as if the laws protecting copyrighted works did not exist.

”I asked Microsoft for a response, but haven’t received a reply.Among judges who have pondered the issues, the tide seems to be building in favor of regarding the training process as fair use. Indeed, Alsup himself came to that conclusion in the Anthropic case, ruling that use of the downloaded material for AI training was fair use — but he also heard evidence that Anthropic had held on to the downloaded material for other purposes — specifically to build a research library of its own.

That’s not fair use, he found, exposing Anthropic to accusations of copyright piracy.Alsup’s ruling was unusual, but also “Solomonic,” Lee told me. His finding of fair use delivered a “partial victory” for Anthropic, but his finding of possible piracy put Anthropic in “a very difficult spot,” Lee says.

That’s because the financial penalties for copyright infringement can be gargantuan, ranging from $750 per work to $150,000 — the latter if a jury finds that the user engaged in willful infringement. As many as 7 million works may have been downloaded by Anthropic, according to filings in the lawsuit, though an undetermined number of those works may have been duplicated in the two shadow libraries the firm used, and may also have been duplicated among copyrighted works the firm actually paid for.

The number of works won’t be known until at least Sept. 1, the deadline Alsup has given the plaintiffs to submit a list of all the allegedly infringed works downloaded from the shadow libraries. If subtracting the duplicates brings the total of individual infringed works to 7 million, a $150,000 bill per work would total $1.

05 trillion. That would financially swamp Anthropic: The company’s annual revenue is estimated at about $3 billion, and its value on the private market is estimated at about $100 billion.“In practical terms,” Lee wrote on his blog, “ChatGPT is eating the world,” class certification means “Anthropic faces at least the potential for business-ending liability.

”Anthropic didn’t reply to my request for comment on that prospect. In a motion asking Alsup to send his ruling to the 9th U.S. Circuit Court of Appeals or to reconsider his finding himself, however, the company pointed to the blow that his position would deliver to the AI industry. If his position were widely adopted, Anthropic stated, then “training by any company that downloaded works from third-party websites like LibGen or Books3 could constitute copyright infringement.

” That was an implicit admission that the use of shadow libraries is widespread in the AI camp, but also a suggestion that since it’s the shadow libraries that committed the alleged piracy, the AI firms that used them shouldn’t be punished.Anthropic also noted in its motion that the plaintiffs in its case didn’t raise the piracy issue themselves — Alsup came up with it on his own, by treating the training of AI bots and the creation of a research library as two separate uses, the former allowed under fair use, the latter disallowed as an infringement.

That deprived Anthropic of an opportunity to respond to the theory in court. The firm observed that a fellow federal judge in Alsup’s San Francisco courthouse, Vince Chhabria, came to a contradictory conclusion only two days after Alsup, absolving Meta Platforms of a copyright infringement claim on similar facts, based on the fair use exemption.

Alsup’s class certification is likely to roil both the plaintiff and defendant camps in the ongoing controversy over AI development. Plaintiffs who haven’t made a piracy claim in their lawsuits may by prompted to add it. Defendants will come under greater pressure to forestall lawsuits by scurrying to reach licensing deals with writers, musicians and artists.

That will happen especially if another judge accepts Alsup’s argument about piracy. “That may well encourage other lawsuits,” Lee says.For Anthropic, the challenge will be “trying to convince a jury that the award of damages should be $750 per work,” Lee says. Alsup’s ruling makes this case one of the rare lawsuits in which “the plaintiffs have the upper hand,” now that they have won class certification.

“All these companies will have great pressure to negotiate settlements with plaintiffs; otherwise, they’re at the mercy of the jury, and you can’t bank on anything in terms of what a jury might do.” More to Read

Analysis

Phenomenon+
Conflict+
Background+
Impact+
Future+

Related Podcasts

Commentary: Here’s the number that could halt the AI revolution in its tracks | Goose Pod | Goose Pod