How to stop LLM citation hallucination

Language models are fluent enough to invent a reference that looks completely real: a plausible author, a believable journal, a DOI that resolves to nothing. For research work, that is the difference between a useful tool and a liability. Here is why it happens, why retrieval alone does not fix it, and what actually does.

Why models invent citations

A language model predicts text. When you ask for a source, it produces the most statistically likely string that looks like a citation, not a record it has verified exists. The result reads like a reference because the model has seen millions of them, but nothing connects that string to a real document. This is not a bug you can prompt away; it is how generation works.

Why plain RAG is not enough

Retrieval-augmented generation (RAG) helps: you fetch real passages and put them in context before the model answers. But standard RAG still lets the model write the final text freely, so it can blend retrieved facts with invented ones, attribute a claim to the wrong passage, or cite a document it was never actually given. Retrieval improves the odds. It does not guarantee that every citation in the output is real.

The gap is verification. Retrieval puts real sources nearby; nothing forces the model to use only those, and nothing checks the output afterward.

The fix: a deterministic grounding check

medground closes the gap with a step that is not a model at all. After an answer is drafted, every claim must carry a paper ID, and a deterministic function, called check_grounding, confirms each cited ID exists in the retrieved corpus before the answer is returned. The rule is simple:

  • Each claim is written as a discrete statement with one or more paper IDs.
  • check_grounding flags any claim that is uncited, cites a paper outside the corpus, or cites evidence that was not retrieved.
  • Flagged claims are repaired or dropped. If a claim has no real source, it does not ship.

Because the check is deterministic rather than another language model judging the first one, it cannot be charmed by a confident-sounding answer. A citation either resolves to a real record in the corpus or it does not.

What this means in practice

You get answers you can audit. Every claim links back to a paper ID you can open, and the corpus is finite and inspectable, so you can see exactly what the answer was built from. For literature work, that traceability is the whole point.

Frequently asked questions

Does this eliminate hallucination completely?

It eliminates fabricated citations: every reference in the output is checked against a real corpus record before the answer ships. It does not judge whether a real source is clinically correct for a given case; that judgment stays with the expert.

Is a grounding check the same as an LLM grading another LLM?

No. An LLM-as-judge is itself probabilistic and can be wrong. check_grounding is a deterministic lookup against the corpus, so its verdict does not depend on a model's confidence.

Can I use this with my own documents?

Yes. medground builds a local corpus you control, then answers only from what is in it, with every citation verified.

Try medground on your own corpus.

Open source, MIT licensed, and running locally in minutes.