Phys.org web signal May 18th 2026

arXiv, PubMed Central harbor 146,900 AI fake citations

hallucinations ai ethics hallucinations scientific-integrity research-reproducibility

Key insights

Researchers found 146,900 AI-hallucinated citations across four major platforms, far exceeding prior annual estimates of the problem's scale.
Hallucinated citations mimic real paper metadata, making them structurally harder to detect than fabricated facts and resistant to automated filtering.
The spread across arXiv, bioRxiv, SSRN, and PubMed Central shows the problem extends well beyond any single platform's moderation scope.

Why this matters

At 146,900 hallucinated citations, the citation graph that underlies scientific credibility has been contaminated at a scale that manual review cannot address. Peer review processes treat prior citations as validated prior work, meaning fake references embedded today will compound as downstream papers cite them. Practitioners building AI tools for literature review, drug discovery, or academic research now face an unquantified false-positive risk embedded in the training and retrieval data those tools depend on.

Summary

A new study counted roughly 146,900 AI-hallucinated citations across arXiv, bioRxiv, SSRN, and PubMed Central, putting a hard number on a contamination problem that has outpaced platform-level responses. The detection gap is structural. Hallucinated citations mimic real paper metadata: author names, journal titles, plausible DOIs, so automated filters designed to catch factual errors miss them entirely. Essentially: (arXiv, bioRxiv, SSRN, PubMed Central) each host portions of a problem no single platform can contain. - arXiv's recently announced hallucination ban targets one pipeline; the same citations are spreading across three other repos simultaneously. - Standard citation tools cannot flag references that look structurally valid but point to nonexistent papers. At 146,900 fake nodes in the citation graph, hallucinated references have become a foundational integrity problem.

Potential risks and opportunities

Risks

Drug discovery and clinical research teams using AI literature synthesis tools (Elicit, Consensus, Semantic Scholar) face unquantified risk that hallucinated citations have already seeded their training or retrieval corpora
arXiv's announced hallucination ban could create a false sense of containment while bioRxiv and SSRN, lacking similar policies, continue accumulating fake citations at the same rate
Publishers and institutional repositories relying on CrossRef DOI validation face reputational and legal exposure if hallucinated citations pass their screening and corrupt the outputs of publicly funded research

Opportunities

Citation verification services (Scite, iThenticate, Retraction Watch) could expand into AI-hallucination detection as a distinct product category with clear institutional demand
Academic publishers (Elsevier, Springer Nature, Wiley) have leverage to mandate citation-verification tooling as a submission requirement, creating a compliance market for reference-validation startups
Startups building reference-validation layers for scientific AI workflows could position against the contaminated-data risk with database-anchored or cryptographic citation verification as a trust primitive

What we don't know yet

Breakdown by platform not disclosed: whether PubMed Central's peer-reviewed subset shows a disproportionate concentration versus preprint-only repos like bioRxiv and SSRN
Which AI writing tools or model families generated the bulk of detected hallucinations, and whether specific models show distinct citation-hallucination signatures
Whether any of the 146,900 citations have already been cited by subsequent papers, propagating fake references into later literature

Originally reported by Phys.org

Read the original article →

Original headline: Study: 146,900 AI-Hallucinated Citations Found Across arXiv, bioRxiv, SSRN, and PubMed Central