"Ghost References: AI Planted 3,000 Fake Citations in Peer-Reviewed Papers"

A Columbia University School of Nursing audit released this week identified nearly **3,000 fake citations** — references to papers that simply do not exist — embedded across peer-reviewed medical literature. The citations passed editorial review, cleared plagiarism checkers, and were published in indexed journals. The mechanism behind them is not plagiarism. It is hallucination: large language models generating plausible-looking but entirely fabricated bibliographic entries.

## What a "Ghost Reference" Actually Is

A ghost reference is a citation that looks structurally correct — it has a journal name, volume number, page range, author list, and DOI — but points to nothing real. The paper it cites was never published, possibly never written. In some cases the DOI resolves to an unrelated article on a different topic. In others it resolves to a 404.

Traditional plagiarism detection tools like iThenticate or Turnitin check for copied text. They do not verify whether the sources being cited are real. This gap is precisely where AI-assisted fabrication slips through.

## The Columbia Audit: Method and Scale

The Columbia team developed an AI-assisted verification pipeline that cross-referenced citations against PubMed, Crossref, Scopus, and Web of Science simultaneously. A citation was flagged as "ghost" if it:

1. Returned zero matches across all four databases
2. Returned a DOI match where the retrieved paper's abstract was unrelated to the citing paper's stated purpose
3. Had an author list that did not match the institutional affiliations in any indexed record

Across the sample of papers reviewed — spanning 2022 to early 2026 — **2,937 citations met at least two of these three criteria**. The concentrated spike began in late 2023, correlating with the proliferation of LLM writing assistants into research workflows.

Medical and clinical nursing literature was disproportionately affected, likely because:
- Citation volume expectations are high (20–60 references per paper is typical)
- Specialized vocabulary makes LLM output harder to verify quickly
- Some researchers use AI to fill out literature review sections without fully auditing each generated reference

## Why LLMs Generate Fake Citations

This is a structural problem with how large language models work, not a bug that can be patched.

LLMs are trained to predict plausible next tokens. A plausible academic reference looks like a journal entry with all the right fields. The model has seen thousands of entries in this format and will generate one confidently when asked. Whether the paper actually exists is not a question the model can answer — it has no live database connection and no concept of "real."

GPT-4 and Claude, when prompted to provide citations for a specific claim, will frequently produce references that are structurally perfect but factually nonexistent. Studies testing this have found hallucinated citation rates of **23–46%** depending on the domain and prompt specificity. Medical and scientific domains, where the model has been trained on less freely-available text, tend to produce higher hallucination rates than computer science.

## The Detection Gap Is Getting Worse

Publishers and journals are behind the curve for three reasons:

**1. Volume**: Over 4 million scientific articles are published annually across indexed journals. Manual citation auditing at this scale is not feasible.

**2. Sophistication**: Earlier AI-generated text was detectable via tools like GPTZero. Newer models produce prose that evades detection far more reliably. The game has moved from detecting AI text to verifying AI-generated facts — a much harder problem.

**3. Incentive misalignment**: Journals have publication fees ranging from $1,500 to $11,000 per paper (APCs in open access models). Retraction is expensive and reputationally damaging for all parties. There is structural pressure to not find problems.

## What Legitimate Researchers Are Doing

Several practices are emerging as standard risk management:

- **Reverse-lookup every citation**: Before submission, run each reference through CrossRef's REST API or Semantic Scholar to verify it resolves. This takes about 2 minutes per citation with the right tooling.
- **DOI-first workflow**: Write your argument, then find real papers to support it. Do not ask an LLM to suggest citations — use it for drafting prose only.
- **Institutional checklists**: Some universities now require authors to certify each citation was accessed directly before submission.

The Columbia team is pushing for a **mandatory citation verification layer** as part of journal submission systems — essentially an automated pre-check that rejects papers with ghost references before they reach peer review.

## The Deeper Problem

The ghost citation crisis is a symptom of a more structural shift: researchers are using AI as a writing tool in contexts where the tool's outputs cannot be trusted at face value. The problem is not that AI writes well. It is that it writes *confidently incorrectly*.

Science's authority rests on the traceability of claims back to evidence. A citation is not decoration — it is the evidence chain. When citations are fabricated, the chain is broken, and the paper's conclusions float free of any empirical foundation. The reader cannot verify, the researcher cannot be held accountable, and the literature gets polluted.

**2,937 ghost references found in one audit of one subdomain suggests the actual scale, across all of indexed science, is orders of magnitude larger.**

"Ghost References: AI Planted 3,000 Fake Citations in Peer-Reviewed Papers"

// COMMENTS

ON THIS PAGE