Commentary: An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy

Commentary: An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy

2025-06-29Technology
--:--
--:--
纪飞
早上好,王康!我是纪飞,这里是专为你量身定制的“Goose Pod”播客。今天是6月29日,星期日。我们今天要聊的话题是:一家人工智能公司赢得了版权侵权诉讼,但可能面临巨额盗版罚单。
国荣
嗯,纪飞,听起来这个话题真是充满了戏剧性!咱们赶紧聊聊。旧金山联邦法院最近的两个判决,表面上看来是AI公司在版权纠纷中占了上风,但深入了解后,你会发现事情远没有那么简单,简直是“一团乱麻”!
纪飞
确实如此。我们先看看Anthropic这个案子。联邦法官威廉·阿尔苏普裁定,AI公司使用受版权作品训练大型语言模型是“变革性”用途,属于版权法中的“合理使用”豁免条款,不构成侵权。Anthropic自己都挺得意,说法院认可了他们模型训练的“惊人变革性”。
国荣
没错,听起来像是AI公司的大胜利,是吧?但等等,别高兴太早!阿尔苏普法官话锋一转,指出Anthropic从在线“影子图书馆”下载了超过700万本未经授权的受版权作品!法官直言,这种行为“本质上、无可挽回地侵权”。这可就有趣了。
纪飞
嗯,阿尔苏普法官原话是:“Anthropic本可以购买,却选择偷走。”他还警告Anthropic,将就这些盗版副本进行审判,可能面临“数百万美元”的巨额赔偿。这简直是冰火两重天啊!一边说训练是合理使用,一边又说数据来源是盗版,真是让人哭笑不得。
国荣
哈哈,这比喻真是贴切!就像你被允许用图书馆的书,却从黑市买了七百万本盗版!更绝的是,同一法院的另一位法官查布里亚,在Meta的类似案件中也判Meta训练AI是合理使用。但他的判决,却给版权方指明了未来获胜的道路,这就有意思了。
纪飞
这么看来,版权方和AI开发者这场长期较量,远未结束。洛杉矶版权律师亚当·莫斯直言,“这两个案子都不会是最终结论。”全国有超过40起类似诉讼,涉及数十亿甚至数万亿美元。这场法律战,很可能要打到最高法院,那可是很多年之后的事了。
国荣
嗯,没错。这场争论的核心在于AI模型究竟怎么“学习”的。AI开发者把海量材料,比如网上抓取的数据、数百万书籍、文章、论文等,一股脑儿地喂给聊天机器人模型。这些材料很多都受版权保护。你看,像作者本人就有三本书未经许可被列入,真是让人哭笑不得。
纪飞
嗯,这个例子确实很有说服力。创作者和艺术家们认为,未经许可使用他们的作品训练AI,就是侵犯版权,除非支付报酬。但AI开发者则援引“合理使用”原则,他们认为只要使用有限材料,且最终产品具有“变革性”,不显著影响原作市场,就算合理使用。
国荣
嗯,这确实是争议焦点。话说回来,Anthropic确实花了数百万美元购买纸质书来训练AI,法官也认可这种合法方式。但问题就出在那700万册从“影子图书馆”下载的书籍上。法官直言不讳地指出,Anthropic“本可以购买书籍,但它宁愿偷窃,以避免‘麻烦’。”这逻辑,真是有点意思。
纪飞
Anthropic回应称,他们获取书籍只为构建大型语言模型,且法院已裁定这种用途合理。但阿尔苏普法官认为,Anthropic的目标不只是训练LLM,还想建立一个“用于研究”或“为产品提供信息”的通用图书馆。嗯,这确实让事情又复杂了。
国荣
嗯,更复杂的是查布里亚法官在Meta案中的裁决。他明确不同意阿尔苏普的观点。他认为公司“无法抵制未经许可就将受版权材料输入模型的诱惑”。他问:这是否违法?他的答案是:“虽然细节决定成败,但在大多数情况下,答案是肯定的。”
纪飞
查布里亚法官的理由非常清晰。他认为大量AI作品出现,将“极大削弱原创市场”,进而“极大削弱人类传统创作动力”。他强调,版权法宗旨正是保护这种动力。他举例说,AI书籍可能不影响阿加莎·克里斯蒂,但很可能阻止下一位阿加莎·克里斯蒂被发现或继续创作。
国荣
哎呀,查布里亚法官简直是在恳求原告提供证据,证明AI影响了他们的市场。他直言:“用受版权书籍赚数十亿,同时创建无限竞争作品流,却不损害市场,这算合理使用?”可惜的是,原告律师“从未提及”,导致他只能判Meta胜诉。这真是让人扼腕叹息啊!
纪飞
嗯,这确实是症结所在。两位法官,同在一家法院,对“合理使用”的解释却出现分歧。阿尔苏普法官认为训练是合理使用,但数据来源盗版侵权;而查布里亚法官更强调AI生成内容对原创市场的影响,认为未经许可的训练本身就可能侵权。这局面真是扑朔迷离。
国荣
哈哈,这比喻真形象!就像你来我家读书学习算合理使用,可如果书是从邻居家偷来的,那问题可就大了!AI公司面临的挑战是,他们需要庞大数据训练模型,但合法获取如此规模的数据,成本极高,技术上都可能很难实现。这难题可不小。
纪飞
嗯,这就是AI公司与版权方最核心的利益冲突。AI公司希望免费或低成本获取数据,甚至辩称费用过高会“扼杀历史性技术的进步”。但查布里亚法官直接驳斥这是“胡说八道”。他认为,若训练书籍真有价值,图书许可市场自然会出现,何必偷呢?
国荣
是啊,如果AI公司真能用这些数据创造万亿价值,为什么不能分一部分给原创作者呢?这不单是技术进步问题,更关乎公平和道德。难道因为技术“有价值”,就能随意拿走别人的劳动成果?这,对知识产权简直是蔑视啊!
纪飞
嗯,这就像餐厅说要免费食材才能创新,否则餐饮业无法发展。但食材供应商会问:我的劳动成果凭什么白给你?这逻辑显然站不住脚。如果大型AI公司免费用这些数据,那小作者、独立艺术家的作品价值如何体现?他们又怎么生存呢?
国荣
没错,这正是加州大学法学院罗宾·费尔德曼主任担忧的。她认为最终会是某种许可协议,但问题是“利益如何分配,小作者是否被排除”。大公司或许能谈判,但那些默默无闻的创作者,他们的声音,恐怕很难被听到啊。
纪飞
嗯,这场法律战影响深远。对AI公司而言,虽表面赢得了“合理使用”判决,但潜在的巨额盗版罚单,可能彻底改变其商业模式,迫使他们重新审视数据来源合法性。这可不是小数目,是“数百万美元”,甚至更多,简直是晴天霹雳。
国荣
是啊,这就像赢了比赛,却因训练方法违规被罚款!对创意产业影响巨大。若AI免费使用大量作品训练并生成类似内容,原创作品市场价值会被稀释,创作者积极性受打击。这恐怕会让“下一位阿加莎·克里斯蒂”都无法崭露头角,真是可惜。
纪飞
没错,这直接关系到我们希望建立怎样的创意生态系统。是鼓励原创、尊重知识产权的,还是任由AI免费“学习”和“创造”的?迪士尼、NBC环球等公司对Midjourney提起诉讼,也表明这场战斗正在升级,版权方看来不会轻易妥协。
国荣
这些判决也将深刻影响AI未来发展。若AI公司不能随意使用“影子图书馆”数据,就必须寻找合法来源,要么购买,要么与版权方达成许可协议。这无疑会增加AI开发成本,但也能促进一个更公平、更可持续的AI生态系统。这倒是个好事。
纪飞
嗯,这将促使AI行业更规范。同时,也提醒所有创作者,要更关注作品是否被不当使用,积极维护版权。毕竟查布里亚法官也说了,如果你能拿出证据证明AI影响了你的市场,那你就赢定了。
国荣
嗯,那么未来这场“AI与版权之战”会走向何方呢?从目前态势看,许可协议似乎不可避免。AI公司需合法获取数据,版权方也需通过许可获合理报酬。这就像AI公司要建知识摩天大楼,得先买好地基,而不是随意占用,这道理很清晰。
纪飞
没错,许可协议的性质和规模,将取决于法院最终裁决。查布里亚法官判决中提到,Meta曾试图谈判许可协议,但发现“影子图书馆”已含大部分作品,就“放弃了许可努力”。这说明,即便大公司,也曾考虑合法途径,只是最终选择了捷径,真是耐人寻味。
国荣
这么看来,未来立法层面会有更多动作。像美国正在讨论的《2024年生成式AI版权披露法案》,要求AI公司披露训练数据来源,增加透明度。还有《禁止AI欺诈法案》,防止AI冒充个人。这些都表明,监管机构正努力追赶AI发展步伐,挺好的。
纪飞
是的,全球立法也在积极推进。比如欧盟《AI法案》要求开发者详细记录训练数据,确保透明可追溯。中国法律框架也在发展,甚至有判例认可AI生成图像版权保护,这与美国只认可人类作者不同。这些都将塑造AI技术的全球发展格局,值得关注。
国荣
嗯,看来未来AI发展,不仅要看技术,更要看法律和伦理。我们作为听众,也可以思考一下,在AI时代,人类创造力该如何被保护和激励?我们又希望看到一个怎样的AI未来呢?这,确实是一个值得深思的问题。
国荣
好了,今天的讨论就到这里了,王康。AI版权之争,无疑是一场复杂且高风险的持久战,它必将重塑创意产业和AI公司的未来格局。
纪飞
感谢你的收听,王康。希望我们今天的分享,能让你对这个重要话题有更深入的了解。Goose Pod,我们明天不见不散!

# Comprehensive News Summary: AI Copyright Infringement Lawsuits ## News Metadata * **News Title**: Commentary: An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy * **Topic**: Technology / AI * **Report Provider/Author**: Los Angeles Times / Michael Hiltzik * **Date/Time Period Covered**: Published on 2025-06-27 10:00:06 (describing rulings from Monday and Wednesday of that week, and earlier filings). * **URL**: [https://www.latimes.com/business/story/2025-06-27/an-ai-firm-won-a-lawsuit-over-copyright-infringement-but-may-face-a-huge-bill-for-piracy](https://www.latimes.com/business/story/2025-06-27/an-ai-firm-won-a-lawsuit-over-copyright-infringement-but-may-face-a-huge-bill-for-piracy) * **Relevant News Identifiers**: Focuses on two recent federal court rulings in San Francisco regarding AI training data and copyright fair use. --- ## Summary of Key Developments The article discusses two recent federal court rulings in San Francisco concerning AI firms' use of copyrighted material for training chatbots, revealing a complex and contradictory legal landscape. While superficially appearing as wins for AI companies on the "fair use" front, both rulings contain significant caveats that suggest a long and uncertain legal battle ahead. ### 1. Anthropic Lawsuit (U.S. Judge William Alsup) * **Defendant**: Anthropic (developer of the Claude chatbot). * **Plaintiffs**: Novelist Andrea Bartz and nonfiction writers Charles Graeber and Kirk Wallace Johnson. * **Main Finding (Fair Use)**: Judge Alsup ruled that using copyrighted works to train Large Language Models (LLMs) like Anthropic's Claude is **"transformative"** and therefore falls within the **"fair use" exemption** from copyright infringement. Anthropic stated, "We are pleased that the Court recognized that using works to train [large language models] was transformative — spectacularly so." * **Critical Caveat (Piracy)**: Despite the fair use finding for training, Alsup also noted that Anthropic had downloaded copies of **more than 7 million books** from online "shadow libraries" without permission. Alsup declared this action **"inherently, irredeemably infringing"** and stated, "Anthropic could have purchased books, but it preferred to steal them to avoid ‘legal/practice/business slog.’" * **Consequence**: Anthropic will face a separate trial on these pirated copies and the resulting damages, which could expose the company to judgments worth **"untold millions of dollars."** * **Anthropic's Defense**: Anthropic argued their acquisition of books was "for one purpose only — building LLMs — and the court clearly held that use was fair." * **Alsup's Counter**: Alsup found Anthropic's goal was not solely LLM training but also to create a general library for "research" or to "inform our products." ### 2. Meta Platforms Lawsuit (U.S. Judge Vince Chhabria) * **Defendant**: Meta Platforms. * **Plaintiffs**: Comedian Sarah Silverman and **12 other published authors**. * **Main Finding (Summary Judgment for Meta)**: Judge Chhabria granted Meta's motion for summary judgment, ruling that Meta's training of AI bots on copyrighted works was defensible as fair use *in this specific case*. * **Critical Caveat (Roadmap for Future Plaintiffs)**: Chhabria explicitly disagreed with Alsup's broader interpretation of fair use for AI training. He stated that using copyrighted materials without permission to train models is generally illegal: "Although the devil is in the details, in most cases the answer will be yes." * **Chhabria's Rationale**: He argued that a flood of AI-generated works could **"dramatically undermine the market"** for original works, thereby **"dramatically undermine the incentive for human beings to create things the old-fashioned way,"** which is the core purpose of copyright law. * **Why Plaintiffs Lost**: Chhabria lamented that the plaintiffs' lawyers failed to provide evidence showing that AI-generated works were affecting their market. He stated, "It’s hard to imagine that it can be fair use to use copyrighted books...to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books." * **Meta's Licensing Efforts**: Court filings revealed Meta had attempted to negotiate licensing agreements but "abandoned its licensing efforts" after realizing a shadow library it had downloaded already contained most of the works it sought to license. * **AI Firms' Argument on Licensing Costs**: AI firms claim licensing will be prohibitively expensive, halting technological progress. Chhabria dismissed this as "nonsense," suggesting a market for book licensing would emerge if the training value is as high as claimed. ## Broader Context and Implications * **Conflicting Rulings**: The differing opinions from judges in the same courthouse create significant legal confusion, making the situation "clear as mud." * **Ongoing Legal Battle**: This is far from the "last word." There are **over 40 lawsuits** on court dockets around the country concerning AI and copyright. * **Stakes**: **Billions of dollars, even trillions,** are at stake for AI developers and the artistic community. * **Future Outlook**: The issue is expected to reach the Supreme Court, likely years from now. * **New Lawsuits**: Walt Disney Co., NBCUniversal, and other studios recently filed a copyright lawsuit against Midjourney, another AI developer. * **Potential Resolution**: Robin Feldman, director of the Center for Innovation at UC College of the Law, believes the "end result will be some form of licensing agreement," though the terms and impact on smaller authors remain uncertain. * **Author's Personal Experience**: The article's author notes that **three of his eight books** are listed in one such collection without his permission, highlighting the widespread nature of the issue. The core tension remains between AI firms seeking free access to vast datasets for training and copyright holders demanding compensation for their intellectual property, especially given the potential for AI to generate competing works.

Commentary: An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy

Read original at Los Angeles Times

To judge from the reaction among the AI crowd, a federal judge’s Monday ruling in a copyright infringement case was a clear win for all the AI firms that use published material to “train” their chatbots. “We are pleased that the Court recognized that using works to train [large language models] was transformative — spectacularly so,” Anthropic, the defendant in the lawsuit, boasted after the ruling.

“Transformative” was a key word in the ruling by U.S. Judge William Alsup of San Francisco, because it’s a test of whether using copyrighted works falls within the “fair use” exemption from copyright infringement. Alsup ruled that using copyrighted works to train bots such as Anthropic’s Claude is indeed fair use, and not a copyright breach.

(Anthropic) could have purchased books, but it preferred to steal them. — U.S. Judge William Alsup Anthropic had to acknowledge a troubling qualification in Alsup’s order, however. Although he found for the company on the copyright issue, he also noted that it had downloaded copies of more than 7 million books from online “shadow libraries,” which included countless copyrighted works, without permission.

That action was “inherently, irredeemably infringing,” Alsup concluded. “We will have a trial on the pirated copies...and the resulting damages,” he advised Anthropic ominously: Piracy on that scale could expose the company to judgments worth untold millions of dollars.What looked superficially as a clear win for AI companies in their long battle to use copyrighted material without paying for it to feed their chatbots, now looks clear as mud.

That’s especially true when Alsup’s ruling is paired with a ruling issued Wednesday by U.S. Judge Vince Chhabria, who works out of the same San Francisco courthouse. In that copyright infringement case, brought against Meta Platforms in 2023 by comedian Sarah Silverman and 12 other published authors, Chhabria also ruled that Meta’s training its AI bots on copyrighted works is defensible as fair use.

He granted Meta’s motion for summary judgment.But he provided plaintiffs in similar cases with a roadmap to winning their claims. He ruled in Meta’s favor, he indicated, only because the plaintiffs’ lawyers failed to raise a legal point that might have given them a victory. More on that in a moment.

“Neither case is going to be the last word” in the battle between copyright holders and AI developers, says Adam Moss, a Los Angeles attorney specializing in copyright law. With more than 40 lawsuits on court dockets around the country, he told me, “it’s too early to declare that either side is going to win the ultimate battle.

”With billions of dollars, even trillions, at stake for AI developers and the artistic community at stake, no one expects the law to be resolved until the issue reaches the Supreme Court, presumably years from now. But it’s worthwhile to look at these recent decisions — and a copyright lawsuit filed earlier this month by Walt Disney Co.

, NBCUniversal and other studios against Midjourney, another AI developer — for a sense of how the war is shaping up.To start, a few words about chatbot-making. Developers feed their chatbot models on a torrent of material, much of it scraped from the web — everything from distinguished literary works to random babbling — as well as collections holding millions of books, articles, scientific papers and the like, some of it copyrighted.

(Three of my eight books are listed in one such collection, without my permission. I don’t know if any have been “scraped,” and I’m not a party to any copyright lawsuit, as far as I know.) The goal is to “train” the bots to extract facts and detect patterns in the written material that can then be used to answer AI users’ queries in a semblance of conversational language.

There are flaws in the process, of course, including the bots’ tendency when they can’t find an answer in their massive hoard of data to make something up.In their lawsuits, writers and artists maintain that the use of their material without permission to train the bots is copyright infringement, unless they’ve been paid.

The AI developers reply that training falls within the “fair use” exemption in copyright law, which depends on several factors — if only limited material is drawn from a copyrighted work, if the resulting product is “transformative,” and if it doesn’t significantly cut into the market for the original work.

That brings us to the lawsuits at hand.Three authors — novelist Andrea Bartz and nonfiction writers Charles Graeber and Kirk Wallace Johnson — sued Anthropic for using their works without permission. In their lawsuit, filed last year, it emerged that Anthropic had spent millions of dollars to acquire millions of print books, new and used, to feed their bots.

“Anthropic purchased its print copies fair and square,” Alsup wrote. It’s generally understood that the owners of books can do almost anything they wish with them, including reselling them. But Anthropic also downloaded copies of more than 7 million books from online “shadow libraries,” which include untold copyrighted works without permission.

Alsup wrote that Anthropic “could have purchased books, but it preferred to steal them to avoid ‘legal/practice/business slog,’” Alsup wrote. (He was quoting Anthropic co-founder and CEO Dario Amodei.) Anthropic told me by email that “it’s clear that we acquired books for one purpose only — building LLMs — and the court clearly held that use was fair.

” That’s correct as far as it goes. But Alsup found that Anthropic’s goal was not only to train LLMs, but to create a general library “we could use for research” or to “inform our products,” as an Anthropic executive said, according to legal papers.Chhabria’s ruling in the Meta case presented another wrinkle.

He explicitly disagreed with Alsup about whether using copyrighted works without permission to train bots is fair use.“Companies have been unable to resist the temptation to feed copyright-protected materials into their models—without getting permission from the copyright holders or paying them.” He posed the question: Is that illegal?

And answered, “Although the devil is in the details, in most cases the answer will be yes.”Chhabria’s rationale was that a flood of AI-generated works will “dramatically undermine the market” for the original works, and thus “dramatically undermine the incentive for human beings to create things the old-fashioned way.

” Protecting the incentive for human creation is exactly the goal of copyright law, he wrote. “While AI-generated books probably wouldn’t have much of an effect on the market for the works of Agatha Christie, they could very well prevent the next Agatha Christie from getting noticed or selling enough books to keep writing.

”Artists and authors can win their copyright infringement cases if they produce evidence showing the bots are affecting their market. Chhabria all but pleaded for the plaintiffs to bring some such evidence before him: “It’s hard to imagine that it can be fair use to use copyrighted books...to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books.

” But “the plaintiffs never so much as mentioned it,” he lamented. As a result, he said, he had no choice but to give Meta a major win against the authors.I asked the six law firms representing the authors for their response to Chhabria’s implicit criticism of their lawyering, but heard back from only one — Boies Schiller Flexner, which told me by email, “despite the undisputed record of Meta’s historically unprecedented pirating of copyrighted works, the court ruled in Meta’s favor.

We respectfully disagree with that conclusion.”All this leaves the road ahead largely uncharted. “Regardless of how the courts rule, I believe the end result will be some form of licensing agreement,” says Robin Feldman, director of the Center for Innovation at UC College of the Law. “The question is where will the chips fall in the deal and will smaller authors be left out in the cold.

”Some AI firms have reached licensing agreements with publishers allowing them to use the latters’ copyrighted works to train their bots. But the nature and size of those agreements may depend on how the underlying issues of copyright infringement play out in the courts. Indeed, Chhabria noted that filings in his court documented that Meta was trying to negotiate such agreements until it realized that a shadow library it had downloaded already contained most of the works it was trying to license.

At that point it “abandoned its licensing efforts.” (I asked Meta to confirm Chhabria’s version, but didn’t get a reply.)The truth is that the AI camp is just trying to get out of paying for something instead of getting it for free. Never mind the trillions of dollars in revenue they say they expect over the next decade — they claim that licensing will be so expensive it will stop the march of this supposedly historic technology dead in its tracks.

Chhabria aptly called this argument “nonsense.” If using books for training is as valuable as the AI firms say they are, he noted, then surely a market for book licensing will emerge. That is, it will — if the courts don’t give the firms the right to use stolen works without compensation. More to Read

Analysis

Impact Analysis+
Event Background+
Future Projection+
Key Entities+
Twitter Insights+

Related Podcasts

Commentary: An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy | Goose Pod | Goose Pod