大语言模型正引发科学论文激增,但质量堪忧。AI生成内容常出现错误、虚构引用,导致论文发表率下降,撤稿率创新高。过度依赖AI引发“认知卸载”,威胁科学诚信。未来需建立严格的验证框架,平衡AI效率与人类智慧,确保科学的纯洁性。
LLMs’ impact on science: Booming publications, stagnating quality
Read original at Ars Technica →Skip to contentThere have been a number of high-profile cases where scientific papers have had to be retracted because they were filled with AI-generated slop—the most recent coming just two weeks ago. These instances raise serious questions about the quality of peer review in some journals—how could anyone let a figure with terms like “runctitional,” “fexcectorn,” and “frymblal” through, especially given the ‘m’ in frymblal has an extra hump?
But it has not been clear whether these high-profile examples are representative. How significantly has AI use been influencing the scientific literature?A collaboration of researchers at Berkeley and Cornell have decided to take a look. They’ve scanned three of the largest archives of pre-publication papers and identified ones that are likely to have been produced using Large Language Models.
And they found that, while researchers produce far more papers after starting to use AI and the quality of the language used went up, the publication rate of these papers has dropped.Searching the archivesThe researchers began by obtaining the abstracts of everything placed in three major pre-publication archives between 2018 and mid-2024.
At the arXiv, this netted them 1.2 million documents; another 675,000 were found in the Social Science Research Network; and bioRxiv provided another 220,000. So, this was both a lot of material to work with and covered a lot of different fields of research. It also included documents that were submitted before Large Language Models were likely to be able to produce output that would be deemed acceptable.
The researchers took the abstracts from the pre-ChatGPT period and trained a model to recognize the statistics of human-generated text. Those same abstracts were then fed into GPT 3.5, which rewrote them, and the same process was repeated. The model could then be used to estimate whether a given abstract was likely to have been produced by an AI or an actual human.
The research team then used this to identify a key transition point: when a given author at one of these archives first started using an LLM to produce a submission. They then compared the researchers’ prior productivity to what happened once they turned to AI. “LLM adoption is associated with a large increase in researchers’ scientific output in all three preprint repositories,” they conclude.
This effect was likely to be most pronounced in people that weren’t native speakers of English. If the researchers limited the analysis to people with Asian names working at institutions in Asia, their rate of submissions to bioRxiv and SSRN nearly doubled once they started using AI and rose by over 40 percent at the arXiv.
This suggests that people who may not have the strongest English skills are using LLMs to overcome a major bottleneck: producing compelling text.Quantity vs. qualityThe value of producing compelling text should not be underestimated. “Papers with clear but complex language are perceived to be stronger and are cited more frequently,” the researchers note, suggesting that we may use the quality of writing as a proxy for the quality of the research it’s describing.
And they found some indication of that here, as non-LLM-assisted papers were more likely to be published in the peer reviewed literature if they used complex language (the abstracts were scored for language complexity using a couple of standard measures).But the dynamic was completely different for LLM-produced papers.
The complexity of language in papers written with an LLM was generally higher than for those using natural language. But they were less likely to end up being published. “For LLM-assisted manuscripts,” the researchers write, “the positive correlation between linguistic complexity and scientific merit not only disappears, it inverts.
”But not all of the differences were bleak. When the researchers checked the references being used in AI-assisted papers, they found that the LLMs weren’t just citing the same papers that everyone else did. They instead cited a broader range of sources, and were more likely to cite books and recent papers.
So, there’s a chance that AI use could ultimately diversify the published research that other researchers consider (assuming they check their own references, which they clearly should).What does this tell us?There are a couple of cautions for interpreting these results. One, acknowledged by the researchers, is that people may be using AI to produce initial text that’s then heavily edited, and that may be mislabeled as human-produced text here.
So, the overall prevalence of AI use is likely to be higher. The other is that some manuscripts may take a while to get published, so their use of that as a standard for scientific quality may penalize more recent drafts—which are more likely to involve AI use. These may ultimately bias some of the results, but the effects the authors saw were so large that they’re unlikely to go away entirely.
Beyond those cautions, the situation these results describe is a bit mixed. On the plus side, the ability of AIs to help researchers express their ideas could help more scientific work come to the attention of the wider community. The authors also note that the use of LLMs trained on general language may limit their reliance on jargon, and thus open up scientific disciplines to people with other specializations, potentially enabling new collaborations.
That said, the disconnect between writing quality and scientific quality may make it harder for researchers to take their usual shortcuts to estimating scientific quality. With nothing obvious to replace it, this could cause some significant challenges for researchers.Left completely unmentioned is the issue of how this plays out in the peer review process.
The low cost of starting online-only journals has led to their proliferation, with a corresponding growth in the need for peer reviewers. Editors regularly complain about not getting reviews back in a timely manner and faculty that they’re swamped with requests to review papers. If LLMs boost researchers’ ability to produce manuscripts for review, the situation is only going to get worse.
In any case, the authors point out this is an entirely new capability, and we’re only just starting to see it put to use. “As models improve and scientists discover new ways to integrate them into their work,” they write, “the future impact of these technologies will likely dwarf the effects that we have highlighted here.
”Science, 2025. DOI: 10.1126/science.adw3000 (About DOIs).John is Ars Technica's science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.
37 Comments•••••


Ars Technica

