Cloudflare 改变了互联网,AI 巨头的好日子到头了

Cloudflare 改变了互联网,AI 巨头的好日子到头了

2025-07-05Technology
--:--
--:--
纪飞
早上好,我是纪飞,欢迎收听 <Goose Pod>。嗯,国荣,今天我们来聊个大事儿。
国荣
没错,纪飞。今天的话题很有意思:Cloudflare 突然改变了游戏规则,那些 AI 巨头的好日子,是不是真的到头了?
纪飞
对。我们先说说发生了什么。从7月1号开始,全球主要的CDN服务商Cloudflare,开始默认阻止AI爬虫。这可不得了,影响了将近20%的互联网。
国荣
哇,20%!这感觉就像,以前网站的大门是敞开的,谁都能进来拿资料。现在 Cloudflare 直接给门上了把锁。AI 公司想进来?得先敲门,获得主人的明确许可才行。一下就把主动权还给了网站主。
纪飞
没错。这背后主要有两个原因。首先是技术上,很多站长抱怨,像OpenAI的GPTBot这类AI爬虫太“暴力”,抓取频率极高,经常把网站爬到瘫痪。
国荣
我懂了。就像一个图书馆,AI爬虫是开着卡车来搬书,把路都堵死了,普通访客进不来。这个比喻形象吧?
纪飞
哈哈,非常形象。第二个关键原因就是版权和钱。你想,出版商辛辛苦苦的内容,被AI免费拿去训练,自己一分钱没拿到,换谁都不乐意。
国荣
确实。这不光是“野蛮施工”,更是“不告而取”。服务器被拖垮,心血还被白拿,难怪大家怨气这么大。
纪飞
所以,冲突就来了。一边是内容创作者,要求控制权和补偿。Cloudflare的CEO就说,新政策就是为了“给予出版商应有的控制权”。
国荣
那AI巨头们肯定急了。他们靠免费数据喂养模型,这下等于“粮仓”要上锁了。他们怎么说?
纪飞
反应很激烈。他们觉得如果获取数据都要授权,AI行业就没法发展。甚至还在游说,想把这种抓取定义为“合理使用”。
国荣
我明白了。典型的利益博弈。一边说‘用我的东西得给钱’,另一边说‘我用你的东西是为进步,应该免费’。Cloudflare这招,是给内容方递了把剑。
纪飞
说得好。它的直接影响就是改变了力量平衡。现在,AI公司不能再随心所欲地拿数据了,他们必须坐下来谈判。这可能会催生一个全新的内容授权市场。
国荣
对网站主来说也是个好消息!之前AI直接给答案,用户就不点进网站了,流量暴跌。现在,这算是帮内容网站夺回了本该属于他们的访问量。也算是一种拨乱反正吧。
纪飞
嗯,展望未来,关键就看其他CDN巨头会不会跟进了。而且,我听说Cloudflare还在测试一个“按次付费抓取”的系统,试图建立一个AI付费获取内容的经济模型。
国荣
有意思。感觉互联网正在从一个“免费任你拿”的草莽时代,向一个更公平、更尊重内容价值的时代过渡。还真让人有点小期待呢!
纪飞
是啊。今天我们聊了Cloudflare新规如何重塑AI与互联网的生态,这无疑是今年科技界最重要的事件之一了。我是纪飞。
国荣
我是国荣。今天的讨论就到这里,感谢您收听 <Goose Pod>,我们明天再见!

Of course. Here is a comprehensive summary of the news article, formatted as requested. ### **Summary of ZDNET Report: Cloudflare's Policy Shift Against AI Crawlers** **News Metadata** * **Title:** Cloudflare just changed the internet, and it’s bad news for the AI giants * **Provider:** ZDNET * **Author:** Steven Vaughan-Nichols * **Publication Date:** July 2, 2025 --- ### **Executive Summary** Effective July 1, the major Content Delivery Network (CDN) Cloudflare has implemented a new default policy to block AI web crawlers from accessing content on its customers' websites. This significant move, which reverses the previous opt-out standard, now requires website owners to explicitly grant permission (opt-in) for AI bots to scrape their data. The policy is a direct response to the aggressive behavior of AI crawlers that overload websites and the widespread, uncompensated use of web content for training AI models. Affecting approximately **20% of the entire web**, this change could fundamentally alter the relationship between content creators and AI companies, potentially forcing the latter to negotiate and pay for data access. --- ### **Key Findings and Policy Changes** #### **1. New Default Policy: Block by Default** * **Effective Date:** Starting July 1. * **Core Change:** For all new websites on its platform, Cloudflare now **blocks AI crawlers by default**. The previous standard required website owners to manually opt-out of being crawled. * **Scope:** The policy impacts Cloudflare's **two million-plus customers**, which collectively represent **20% of the web**. * **Enhanced Detection:** Cloudflare will also use behavioral analysis and machine learning to identify and block "shadow" scrapers that try to hide their identity. #### **2. Rationale for the Change** * **Technical Overload:** Website owners have reported that AI crawlers (e.g., OpenAI's GPTBot, Anthropic's ClaudeBot) are far more aggressive than traditional search bots. * They generate massive request volumes, sometimes hitting sites with **hundreds of requests per second**, causing significant slowdowns. * As an example of high traffic, GoogleBot alone sends over **4.5 billion requests a month** to sites hosted on Vercel. * **Copyright and Compensation:** Publishers and creators are frustrated that AI companies are "strip mining" the web for content to train models without consent or compensation, often ignoring protocols like `robots.txt`. * **Legal Context:** This move comes amid legal battles where courts have sometimes ruled in favor of AI firms (Meta, Anthropic) under the "fair use" doctrine. ZDNET's parent company, Ziff Davis, filed a lawsuit against OpenAI in **April 2025** over alleged copyright infringement. * **Decline in Publisher Traffic:** The rise of AI-powered search and content generation has led to a sharp decline in traffic to original news sources. * **Statistic:** Business Insider's traffic dropped by **55%** between April 2022 and April 2025. * **Prediction:** Nicholas Thompson, CEO of The Atlantic, predicted that his staff should "expect traffic from Google to drop to zero" due to AI. #### **3. Proposed Economic Model: "Pay Per Crawl"** * Cloudflare has launched a program in private beta called **"Pay Per Crawl."** * This system allows publishers to set their own prices for AI companies that wish to scrape their content. * Technically, it will use the **HTTP 402 "Payment Required"** server response, an older but simple-to-implement standard, to manage these paid access requests. --- ### **Industry Reactions and Notable Statements** * **Matthew Prince, Cloudflare CEO:** The policy aims to *"give publishers the control they deserve and build a new economic model that works for everyone—creators, consumers, tomorrow’s AI founders, and the future of the web itself."* * **Nicholas Thompson, The Atlantic CEO:** *"Until now, AI companies have not needed to pay for content licenses because they could simply take it without repercussions. Now they will need to negotiate."* * **Sir Nick Clegg, Meta Executive:** In contrast, the Meta executive and former UK Deputy Prime Minister stated that asking for permission before scraping copyrighted content *"will 'basically kill the AI industry.'"* --- ### **Risks, Concerns, and Future Outlook** * **Shift in Power Dynamics:** The primary impact is a shift of power from AI companies to content publishers. AI firms may no longer be able to freely take data and will be forced to negotiate licenses or pay for access to a significant portion of the internet. * **Regulatory Uncertainty:** The move occurs amidst a contentious debate over AI and copyright. The U.S. Copyright Office's recent report suggested mass scraping does not qualify as fair use, but its head was subsequently fired by the Trump administration and replaced with an attorney with no copyright experience. * **Industry-Wide Implications:** The key question is whether other major CDNs, such as Akamai, will adopt similar policies. For now, the era of unrestricted, free data scraping for AI training has ended for the one-fifth of the internet managed by Cloudflare.

Cloudflare just changed the internet, and it’s bad news for the AI giants

Read original at ZDNET

iStock / Getty Images PlusThe major internet Content Delivery Network (CDN), Cloudflare, has declared war on AI companies. Starting July 1, Cloudflare now blocks by default AI web crawlers accessing content from your websites without permission or compensation.The change addresses a real problem. My own small site, where I track all my stories, Practical Technology, has been slowed dramatically at times by AI crawlers.

It's not just me. Numerous website owners have reported that AI crawlers, such as OpenAI's GPTBot and Anthropic's ClaudeBot, generate massive volumes of automated requests that clog up websites so they're as slow as sludge. GoogleBot alone reports that the cloud-hosting service Vercel bombards the sites it hosts with over 4.

5 billion requests a month. These AI bots often crawl sites far more aggressively than traditional search engine crawlers. They sometimes revisit the same pages every few hours or even hit sites with hundreds of requests per second. While the AI companies deny that their bots are to blame, the evidence tells a different story.

Also: Senate removes ban on state AI regulations from Trump's tax billThus, on behalf of its two million-plus customers, 20% of the web, Cloudflare now blocks AI crawlers. For any new website signing up for its services, AI crawlers will be automatically blocked from accessing its content unless the site owner grants explicit permission.

Additionally, Cloudflare promises to detect "shadow" scrapers — bots that attempt to evade detection — by using behavioral analysis and machine learning. What's good for the AI goose is good for the gander. This move reverses the previous status quo, where website owners had to opt out of AI crawling.

Now, blocking is the default, and AI vendors must request access and clarify their intentions, whether for model training, search, or other uses, before they're allowed in. This change arises not only because of frustrated website owners. Numerous publishing companies, such as The Associated Press, Condé Nast, and ZDNET's own parent company, Ziff Davis, are frustrated that AI companies have been "strip mining" the web for content.

All too often, this has been done without compensation or consent, and sometimes, ignoring standard protocols like robots.txt that are meant to block crawlers. (Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

)Moreover, recent court cases have ruled in favor of Meta and Anthropic, finding that their use of copyrighted works was legal under the doctrine of fair use. Needless to say, writers, artists, and publishers don't like this one bit. Publishers are still worried that the federal government will give AI free rein to do as it wants with their content.

AI powerhouses such as OpenAI and Google are continuing to lobby the government to classify AI training on copyrighted data as fair use. It's also worth noting that after the Copyright Office released a pre-publication version of its 108-page copyright and AI report, which struck a middle ground by supporting both of these world-class industries that contribute so much to our economic and cultural advancement.

However, it added that while some generative AI probably constitutes a "transformative" use, the mass scraping of all data did not qualify as fair use. The next day, the Trump administration fired the head of the Copyright Office and replaced her with an attorney with no prior experience in copyright law.

Also: The US Copyright Office's new ruling on AI art is here - and it could change everythingGiven all this, it's no wonder that publishers sought an ally in technology.As Cloudflare CEO Matthew Prince said in a statement, its new policy is meant to "give publishers the control they deserve and build a new economic model that works for everyone—creators, consumers, tomorrow's AI founders, and the future of the web itself."

To complement the move to block AI crawlers, Cloudflare has also launched its "Pay Per Crawl" program. This enables publishers to set their own rates for AI companies that want to scrape their content. Also: AI-generated images are a legal mess - and still a very human processThis system is currently in private beta and aims to create a framework where AI firms can pay for access, or be denied if they refuse.

Technically, this will be done by dusting off an old, mostly unused web server response, HTTP 402, which responds with a "Payment Required" error message. This means it should be simple to implement and compatible with existing websites and their infrastructure. Overall, this is a big deal. Thanks to Cloudflare powering such a large portion of the internet, a significant amount of web content could become inaccessible to AI companies unless they negotiate access or pay licensing fees.

As Nicholas Thompson, CEO of The Atlantic, noted, "Until now, AI companies have not needed to pay for content licenses because they could simply take it without repercussions. Now they will need to negotiate." To this point, most AI companies have been actively against paying for content. As Sir Nick Clegg, former deputy UK Prime Minister and Meta executive, said recently, merely asking artists' permission before they scrape copyrighted content will "basically kill the AI industry."

Also: Cloudflare blocks largest DDoS attack - here's how to protect yourselfCloudflare's new policy is a direct response to this approach and the increasing volume and intrusiveness of AI crawlers that have come with it. It's also an attempt to stop the siphoning of traffic that would otherwise go to publishers.

Since the rise of AI, traffic to news sites has plunged. For example, Business Insider's traffic dropped by over half, 55% from April 2022 to April 2025. Left unchecked, Thompson recently predicted that, thanks to AI, the Atlantic staff should expect traffic from Google to drop to zero.What will happen next?

Will the other CDN, such as Akamai, follow suit? Stay tuned. For now, the era of unrestricted AI crawling appears to be ending, well, at least for the fifth of the internet that flows through Cloudflare's pipes.Get the morning's top stories in your inbox each day with our Tech Today newsletter.Featured

Analysis

Phenomenon+
Conflict+
Background+
Impact+
Future+

Related Podcasts

Cloudflare 改变了互联网,AI 巨头的好日子到头了 | Goose Pod | Goose Pod