arXiv Imposes One-Year Ban for Unchecked AI-Generated Content, Citing "Hallucinated References"

Image for arXiv Imposes One-Year Ban for Unchecked AI-Generated Content, Citing "Hallucinated References"

Ithaca, NY – arXiv, the prominent open-access preprint server, has implemented a new policy imposing a one-year ban on authors whose submissions contain "incontrovertible evidence" of unchecked errors generated by large language models (LLMs), such as hallucinated references. This move aims to curb the increasing influx of low-quality, AI-assisted papers that compromise scientific integrity.

The policy, detailed by Thomas Dietterich, Chair of arXiv's computer science section, states that authors found to have submitted content with clear, unverified LLM-generated mistakes will face a 12-month suspension. Following the ban, any subsequent submissions from these authors must first be accepted by a reputable peer-reviewed venue before they can be posted on arXiv. This effectively creates a significant hurdle for re-entry.

Steinn Sigurðsson, a figure associated with arXiv's moderation, underscored the necessity of these measures in a recent social media post. "on the whole @arxiv flap about hallucinated references etc," Sigurðsson tweeted, adding, "you don't see the stuff we reject... some of it is really really egregious." He further explained that "the decision to impose additional consequences is largely to throttle that stuff so n00bs and bad actors don't trash us trying repeatedly."

Examples of such "incontrovertible evidence" include non-existent citations, placeholder text, or meta-comments from LLMs left unedited within submissions. While arXiv does not prohibit the use of AI tools, it emphasizes that authors bear full responsibility for the accuracy and integrity of their content, regardless of how it was generated. The new rule is described as a "one-strike" policy, with a review process for flagging and appealing decisions.

The initiative comes amidst growing concerns within the scientific community regarding the integrity of research papers, particularly with the widespread adoption of generative AI. The policy seeks to maintain arXiv's standards as a reliable repository for scholarly communication by deterring negligent use of AI and ensuring that submissions meet fundamental academic rigor.