维基百科对AI颇感担忧

维基百科对AI颇感担忧

2025-10-21Technology
--:--
--:--
马老师
老王, a very good evening to you. 我是马老师, 欢迎收听专属于你的 Goose Pod. 今天是10月21日,星期二,晚上9点43分。我们今天聊一个很有意思的话题:维基百科对AI颇感担忧。
李白
天阶夜色凉如水,卧看牵牛织女星。老王,有劳久候,我是李白。今夜,你我二人,与马老师一同,共探这机巧之物与天下文章之干系。
马老师
好,那我们就先从核心事件说起。维基百科最近发现一个大问题,他们网站的真人浏览量,同比下降了8%。这个数字背后,是一个很有趣的现象,你懂的,这不是简单的用户流失。
李白
哦?愿闻其详。莫非是人心厌倦了这浩如烟海的墨卷,转而寻那片刻的声色之娱?八个百分点的流失,非同小可,宛如江河改道,非有巨力,不能为也。
马老师
不是人心厌倦,而是被“截胡”了。维基百科发现,流量里混入了很多伪装成人类的机器人。这些bots,很多是AI公司派出来“偷学武功”的,它们疯狂抓取维基的数据去训练大模型,然后直接在搜索结果里给你一个总结。
李白
此非人之过,乃机巧之祸也!无形之手,潜行于无垠网海,窃取珠玑,而留其空壳。恰如墨客挥毫,神思被无魂之影所噬,悲哉!这与江湖上那些吸人内力的邪门功夫,何其相似!
马老师
李白兄这个比喻很到位。这就是典型的“降维打击”。AI公司把维基百科这个巨大的知识库,当作免费的data set。它们吸收了所有内容,然后自己另起炉灶,用户根本不需要再访问维基了。
李白
吸人真气,壮自己之体魄。此等行径,与盗匪何异?长此以往,那千万志愿者呕心沥血筑起的知识殿堂,岂不将沦为断壁残垣?
马老师
更严重的是,这不仅仅是流量问题。你想,AI拿走这些信息后,可能会断章取义,甚至被用来进行信息操纵和金融诈骗,比如之前就发生过黑客利用AI进行“pump and dump”的骗局。
李白
可叹!可惊!昔有画龙点睛,今有AI画皮。真假难辨,虚实相生。这“画皮”之术若用于歧途,恐将天下大乱,人心惶惶。
马老师
是的,信息的源头被污染,或者被掏空,这是最危险的。维基百科的担忧,我认为,触及到了整个信息时代的根基问题。这也是我们今天讨论的第一个核心,问题的严重性已经摆在桌面上了。
马老师
要理解这个问题,我们得先“复盘”一下历史。在AI出现之前,互联网的“武林盟主”是搜索引擎,比如谷歌。而维基百科,就像是少林寺的“藏经阁”,收藏了天下武学秘籍。
李白
然也!昔日寻章摘句,必经山门,方得一见琳琅。谷歌者,不过一引路之童子。而维基者,乃天下学问之渊薮也。我辈作文,亦常从中探寻典故,汲取灵感。
马老师
没错。从90年代的Archie、WebCrawler,到后来的雅虎,再到谷歌凭借PageRank算法一统江湖,搜索引擎的核心逻辑,是“指路”。它告诉你,你要找的知识在“藏经阁”的第几层,第几卷。
李白
此乃君子之道。只引渡,不占有。予人以渔,而非直接予鱼。这维基百科,汇集四海八荒之智者,众手成书,其功德无量,本应受世人景仰。
马老师
但现在情况变了。谷歌、苹果、亚马逊这些科技巨头,这些新的“武林高手”,它们不满足于指路了。它们直接派AI这个“扫地僧”进入藏经阁,把所有秘籍都背下来,然后在山门口直接告诉你答案。
李白
竟有此事?那“藏经阁”岂非成了他们的私家书房?那些辛苦撰写经文的僧侣们,他们的心血,就这样被轻易窃取了?
马老师
是的,你看谷歌的AI Overview,苹果的Siri,亚马逊的Alexa,它们回答问题的很多信息都来自维基百科。这形成了一种深度绑定。维基百科甚至成立了一个叫Wikimedia Enterprise的商业公司,专门向这些大公司收费,提供更便捷的数据接口。
李白
哦?如此说来,维基亦非全然的受害者。一手交钱,一手交货。这听起来,倒像是一场你情我愿的交易。只是,那些默默奉献的志愿者们,他们又得到了什么?
马老师
这就是问题的复杂性所在。维基基金会财务状况其实很稳健,甚至收到了谷歌的捐款。但很多志愿者觉得,他们的劳动成果被商业公司利用,而自己却在为爱发电,这中间的价值链条是不对等的,你懂的。
李白
我懂了。名为“共享”,实为“豪夺”。名为“捐赠”,实为“收买”。这看似平静的江湖之下,早已是暗流涌动,利益纠葛,剪不断,理还乱。
马老师
对。所以,维基百科和科技巨头的关系,从最早的“指路人与图书馆”,演变成了现在这种既合作又依赖,同时又充满潜在冲突的复杂关系。这是理解当前“8%流量下降”问题的大背景。
马老师
有了这个背景,我们就能理解现在的冲突有多激烈了。这已经不是小门派之间的切磋,而是整个武林的“华山论剑”。核心就是版权,也就是“武功秘籍”的所有权问题。
李白
普天之下,莫非王土。然今日之世,竟有无主之物,可任人采撷乎?若无版权之限,那创作者之心血,岂非如东流之水,一去不复返?
马老师
正是如此。《纽约时报》最近就正式起诉了OpenAI和微软,指控它们用海量的新闻报道来训练ChatGPT,这是赤裸裸的版权侵犯。这打响了内容创作者反击的第一枪。
李白
好!《纽约时报》此举,大快人心!此非一家之怒,乃天下创作者之怒也!当浮一大白!若无规矩,不成方圆。这AI虽巧,亦不能凌驾于法度之上。
马老师
不只是新闻业,游戏设计行业也充满了“AI绝望”。很多独立设计师发现,自己的原创产品刚上线,AI生成的山寨品就已经满天飞了。这种釜底抽薪式的抄袭,正在扼杀创新。
李白
此诚如春蚕作茧,却为他人做了嫁衣裳。呕心沥血,头白依然不悔,却不料一朝心血,皆为宵小所窃。此情此景,闻者伤心,见者落泪!
马老师
所以全球都在博弈。电影制片厂要求AI公司必须授权内容才能用于训练。欧盟的规定比较严格,给了内容所有者选择退出的权利,像一个“金钟罩”。而日本则比较开放,希望以此扶持AI产业。
李白
一收一放,一紧一松。可见各国对此事的考量,亦是各不相同。此中权衡,如履薄冰。既要护住创作者的拳拳之心,又要顺应这技术发展的滔滔大势。难矣哉!
马老师
是的。科技公司辩称,给AI豁免权,有助于整个行业的创新。但内容所有者认为,这是以牺牲他们为代价的。这个矛盾,我认为,是当前AI发展面临的最大法律和伦理冲突。
马老师
这场冲突对维基百科的影响是致命的。它直接冲击了维基百科的“内力”,也就是它的可持续发展模式。维基百科不靠广告,靠的是社区捐款和志愿者的热情。
李白
以众人之薪,燃知识之火。此模式,看似微弱,实则坚韧。如涓涓细流,汇成江海。但若水源断绝,江海亦有干涸之日。
马老师
现在水源就在面临断绝的危险。一位资深维基编辑说:“我们的贡献被价值数十亿的科技公司收割,而我们却在免费工作。越来越难说服自己继续投入时间了。” 这种情绪正在蔓延。
李白
哀哉!萤火之光,欲与日月争辉,其心可嘉。然巨兽食其光,而身不稍驻。侠客之心,日渐冰冷,剑锋亦将蒙尘。长此以往,天下再无仗义执言之士矣!
马老师
是的,志愿者的流失,会直接导致内容质量下降,更新变慢。更可怕的是,如果AI生成的内容被恶意行为者利用,反过来污染维基百科,那它作为可靠知识来源的声誉就会彻底崩塌。
李白
根基若毁,大厦必倾。维基百科之忧,非杞人之忧。此乃唇亡齿寒之理。当其不存,我辈从何处觅得那未经雕琢的璞玉,又从何处寻得那百家争鸣的真趣?
马老师
所以,AI的冲击不仅仅是抢走了流量和捐款,它在动摇整个众包知识生产模式的根基。我认为,这是对人类协作精神的一次重大考验。
马老师
那维基百科是不是就束手无策,只能坐以待毙呢?也不是。他们最近推出了一个三年的AI strategy,一个应对未来的“锦囊妙计”。
李白
哦?愿闻其计。是铸起高墙,抵御外敌?还是开门揖盗,与之同流?
马老师
都不是。他们的核心思路是:不取代人,而是辅助人。简单说,就是“给扫地僧一把更牛的扫帚”。他们要用AI来帮助志愿者处理技术性任务,比如内容审核、翻译、引导新手编辑等。
李白
以子之矛,攻子之盾,善哉!不与狂澜争锋,而借其力以渡舟。维基此举,颇有道家无为之妙。化机巧为己用,方能于风浪中立于不败之地。
马老师
是的,把人的精力解放出来,专注于内容质量和策划这些更有创造性的工作。同时,他们优先考虑使用开源的AI模型,保持透明度。这是一种很聪明的平衡术,你懂的。
李白
明智之举。既不废远古,又不背近习。如此,则维基之火,或可借AI之风,燃得更旺。未来可期,未来可期啊!
马老师
长远来看,维基百科的未来可能在于和AI搜索形成一种伙伴关系。利用AI提供更个性化、动态的摘要,同时,通过合作确保自己作为核心信息源的地位。这是一个十字路口,但并非绝路。
马老师
好了,老王。今天我们聊了维基百科在AI时代的困境与破局。最重要的 takeaway 是,AI对维基这类开放知识库构成了生存威胁,但同时也可能是升级的契机。这是一场关于知识、所有权和未来的大博弈。
李白
今日之言,暂告一段。明月依旧,我辈岂是蓬蒿人!且待明日,再与君煮酒论英雄。感谢收听Goose Pod,后会有期。

### **News Summary: Wikipedia's Concerns Over AI Impact** **Metadata:** * **News Title**: Wikipedia Is Getting Pretty Worried About AI * **Report Provider/Author**: John Herrman, New York Magazine (nymag.com) * **Date/Time Period Covered**: The article discusses observations and data from **May 2025** through the "past few months" leading up to its publication on **October 18, 2025**, with comparisons to **2024**. * **News Identifiers**: Topic: Artificial Intelligence, Technology. **Main Findings and Conclusions:** Wikipedia has identified that a recent surge in website traffic, initially appearing to be human, was largely composed of sophisticated bots. These bots, often working for AI firms, are scraping Wikipedia's content for training and summarization. This bot activity has masked a concurrent decline in actual human engagement with the platform, raising concerns about its sustainability and the future of online information access. **Key Statistics and Metrics:** * **Observation Start**: Around **May 2025**, unusually high amounts of *apparently human* traffic were first observed on Wikipedia. * **Data Reclassification Period**: Following an investigation and updates to bot detection systems, Wikipedia reclassified its traffic data for the period of **March–August 2025**. * **Bot-Driven Traffic**: The reclassification revealed that much of the high traffic during **May and June 2025** was generated by bots designed to evade detection. * **Human Pageview Decline**: After accounting for bot traffic, Wikipedia is now seeing declines in human pageviews. This decrease amounts to roughly **8%** when compared to the same months in **2024**. **Analysis of the Problem and Significant Trends:** * **AI Scraping for Training**: Bots are actively scraping Wikipedia's extensive and well-curated content to train Large Language Models (LLMs) and other AI systems. * **User Diversion by AI Summaries**: The rise of AI-powered search engines (like Google's AI Overviews) and chatbots provides direct summaries of information, often eliminating the need for users to click through to the original source like Wikipedia. This shifts Wikipedia's role from a primary destination to a background data source. * **Competitive Content Generation**: AI platforms are consuming Wikipedia's data and repackaging it into new products that can be directly competitive, potentially making the original source obsolete or burying it under AI-generated output. * **Evolving Web Ecosystem**: Wikipedia, founded as a stand-alone reference, has become a critical dataset for the AI era. However, AI platforms are now effectively keeping users away from Wikipedia even as they explicitly use and reference its materials. **Notable Risks and Concerns:** * **"Death Spiral" Threat**: A primary concern is that a sustained decrease in real human visits could lead to fewer contributors and donors. This situation could potentially send Wikipedia, described as "one of the great experiments of the web," into a "death spiral." * **Impact on Contributors and Donors**: Reduced human traffic directly threatens the volunteer base and financial support essential for Wikipedia's operation and maintenance. * **Source Reliability Questions**: The article raises a philosophical point about AI chatbots' reliability if Wikipedia itself is considered a tertiary source that synthesizes information. **Important Recommendations:** * Marshall Miller, speaking for the Wikipedia community, stated: "We welcome new ways for people to gain knowledge. However, LLMs, AI chatbots, search engines, and social platforms that use Wikipedia content must encourage more visitors to Wikipedia." This highlights a call for AI developers and platforms to direct traffic back to the original sources they utilize. **Interpretation of Numerical Data and Context:** The numerical data points to a critical shift in how Wikipedia's content is accessed and utilized. The observation of high traffic in **May 2025** was an initial indicator of an anomaly. The subsequent reclassification of data for **March–August 2025** provided the concrete evidence that bots, not humans, were responsible for the surge, particularly in **May and June 2025**. The **8% decrease** in human pageviews, measured against **2024** figures, quantifies the real-world impact: fewer people are visiting Wikipedia directly, a trend exacerbated by AI's ability to summarize and present information without sending users to the source. This trend poses a significant risk to Wikipedia's operational model, which relies on human engagement and support.

Wikipedia Is Getting Pretty Worried About AI

Read original at New York Magazine

The free encyclopedia took a look at the numbers and they aren’t adding up. By , a tech columnist at Intelligencer Formerly, he was a reporter and critic at the New York Times and co-editor of The Awl. Photo: Wikimedia Over at the official blog of the Wikipedia community, Marshall Miller untangled a recent mystery.

“Around May 2025, we began observing unusually high amounts of apparently human traffic,” he wrote. Higher traffic would generally be good news for a volunteer-sourced platform that aspires to reach as many people as possible, but it would also be surprising: The rise of chatbots and the AI-ification of Google Search have left many big websites with fewer visitors.

Maybe Wikipedia, like Reddit, is an exception? Nope! It was just bots: This [rise] led us to investigate and update our bot detection systems. We then used the new logic to reclassify our traffic data for March–August 2025, and found that much of the unusually high traffic for the period of May and June was coming from bots that were built to evade detection … after making this revision, we are seeing declines in human pageviews on Wikipedia over the past few months, amounting to a decrease of roughly 8% as compared to the same months in 2024.

To be clearer about what this means, these bots aren’t just vaguely inauthentic users or some incidental side effect of the general spamminess of the internet. In many cases, they’re bots working on behalf of AI firms, going undercover as humans to scrape Wikipedia for training or summarization. Miller got right to the point.

“We welcome new ways for people to gain knowledge,” he wrote. “However, LLMs, AI chatbots, search engines, and social platforms that use Wikipedia content must encourage more visitors to Wikipedia.” Fewer real visits means fewer contributors and donors, and it’s easy to see how such a situation could send one of the great experiments of the web into a death spiral.

Arguments like this are intuitive and easy to make, and you’ll hear them beyond the ecosystem of the web: AI models ingest a lot of material, often without clear permission, and then offer it back to consumers in a form that’s often directly competitive with the people or companies that provided it in the first place.

Wikipedia’s authority here is bolstered by how it isn’t trying to make money — it’s run by a foundation, not an established commercial entity that feels threatened by a new one — but also by its unique position. It was founded as a stand-alone reference resource before settling ambivalently into a new role: A site that people mostly just found through Google but in greater numbers than ever.

With the rise of LLMs, Wikipedia became important in a new way as a uniquely large, diverse, well-curated data set about the world; in return, AI platforms are now effectively keeping users away from Wikipedia even as they explicitly use and reference its materials. Here’s an example: Let’s say you’re reading this article and become curious about Wikipedia itself — its early history, the wildly divergent opinions of its original founders, its funding, etc.

Unless you’ve been paying attention to this stuff for decades, it may feel as if it’s always been there. Surely, there’s more to it than that, right? So you ask Google, perhaps as a shortcut for getting to a Wikipedia page, and Google uses AI to generate a blurb that looks like this: This is an AI Overview that summarizes, among other things, Wikipedia.

Formally, it’s pretty close to an encyclopedia article. With a few formatting differences — notice the bullet-point AI-ese — it hits a lot of the same points as Wikipedia’s article about itself. It’s a bit shorter than the top section of the official article and contains far fewer details. It’s fine!

But it’s a summary of a summary. The next option you encounter still isn’t Wikipedia’s article — that shows up further down. It’s a prompt to “Dive deeper in AI Mode.” If you do that, you see this: It’s another summary, this time with a bit of commentary. (Also: If Wikipedia is “generally not considered a reliable source itself because it is a tertiary source that synthesizes information from other places,” then what does that make a chatbot?

) There are links in the form of footnotes, but as Miller’s post suggests, people aren’t really clicking them. Google’s treatment of Wikipedia’s autobiography is about as pure an example as you’ll see of AI companies’ effective relationship to the web (and maybe much of the world) around them as they build strange, complicated, but often compelling products and deploy them to hundreds of millions of people.

To these companies, it’s a resource to be consumed, processed, and then turned into a product that attempts to render everything before it is obsolete — or at least to bury it under a heaping pile of its own output. Wikipedia Is Getting Pretty Worried About AI

Analysis

Conflict+
Related Info+
Core Event+
Background+
Impact+
Future+

Related Podcasts