Large Language Models

Hallucination

When an LLM produces fluent, confident-sounding output that is factually wrong or invented.

In common use since 2020

Hallucination is when an LLM produces an answer that sounds confident and well-formed but is factually wrong or entirely invented. It is the single most-cited limitation of generative AI, and the single biggest reason mature deployments include retrieval, validation and human review around the model rather than relying on the model alone.

Hallucinations come in flavours. Factual hallucinations invent dates, numbers, citations or biographies. Source hallucinations cite papers, URLs or court cases that do not exist. Coding hallucinations invent function names, package APIs and library behaviours. Logical hallucinations produce a plausible argument with a missing or contradictory step.

The root cause is structural. An LLM is fundamentally a probability model over tokens trained to produce fluent text. It has no built-in mechanism for knowing when it does not know something, and it will always produce some output when asked. The training process rewards plausible-sounding answers over honest "I don't know" responses. Recent reasoning models (GPT-5 reasoning, Claude Sonnet 4 extended thinking) are markedly better at calibrated uncertainty, but the problem has not been solved.

The mitigations that work in 2026:

  • Retrieval-augmented generation (RAG) — give the model real source documents and tell it to cite them. Hallucination rates on factual questions drop dramatically.
  • Structured output with validation — constrain the model to a JSON schema and reject malformed responses; catches many invented-field issues.
  • Tool use / function calling — for math, dates, conversions, real-time data, force the model to call a deterministic tool instead of guessing.
  • Self-consistency and multi-sample voting — sample several answers; if they disagree, escalate.
  • Verification passes — a second LLM call (or a smaller validator model) checks the first response against the source.
  • Guardrails on what to refuse — explicit instructions for what to do when uncertain; "if you do not have a source, say so" works better than expected.
  • Human review for high-stakes output — in legal, medical or financial contexts, the model is a draft generator, not the final decision-maker.

For a US team shipping LLM products, the rule is: never expose unverified model output as authoritative. The product surface should always make the source visible (citations, retrieved chunks, raw tool outputs) so users can audit. That single design choice converts hallucination from a brand-killer into a manageable risk.

Keep exploring

Looking for something else? The full glossary covers 120+ AI terms updated for 2026.

Open the glossary
Chat on WhatsApp