A Survey on Hallucination in Large Language Models

Source Context

This is a comprehensive survey from Harbin Institute of Technology and Huawei researchers, accepted to ACM TOIS. It represents one of the most cited and thorough treatments of LLM hallucination to date.

Framing Analysis

The authors position this survey as addressing a fundamental gap: existing hallucination taxonomies were designed for task-specific NLG models with clear source contexts (summarization, translation). LLMs operate as open-ended, general-purpose systems where the distinction between “contradicting the source” and “being factually wrong about the world” becomes critical.

The transferable insight in the framing: When systems shift from narrow to general purpose, the failure modes require reclassification. The intrinsic/extrinsic distinction worked for bounded tasks. LLMs need factuality/faithfulness as the primary split.

What makes this survey distinctive:

  • Links each cause to corresponding mitigation strategies (not just a catalog)
  • Includes RAG limitations analysis (acknowledges that the common “solution” has its own hallucination problems)
  • Proposes a lifecycle view: data → training → inference as causal stages

Key Taxonomy

Factuality Hallucination

Output contradicts verifiable real-world facts.

Factual Contradiction:

  • Entity-error: Wrong entities (e.g., “Edison invented the telephone”)
  • Relation-error: Wrong relationships between correct entities

Factual Fabrication:

  • Unverifiability: Claims that cannot be checked against any source
  • Overclaim: Assertions that lack universal validity due to subjective bias

Faithfulness Hallucination

Output diverges from user input or internal consistency.

  • Instruction inconsistency: Model misinterprets or ignores user directive
  • Context inconsistency: Output contradicts provided context (e.g., RAG input)
  • Logical inconsistency: Internal contradictions in reasoning chains

Causal Framework

Data-Level Causes

  • Misinformation in training corpora (imitative falsehood)
  • Societal biases propagated through memorization
  • Knowledge boundaries (long-tail, temporal, copyright-restricted)
  • Inferior alignment data introducing facts beyond model’s knowledge boundary

Training-Level Causes

  • Pre-training: Unidirectional attention limits contextual dependencies; exposure bias
  • SFT: Forces models to respond beyond their knowledge boundaries; no “I don’t know” training
  • RLHF: Sycophancy, models learn to please evaluators over being truthful

Inference-Level Causes

  • Stochastic sampling trades diversity for accuracy
  • Over-confidence: Local attention focus leads to instruction forgetting
  • Softmax bottleneck: Output probability constraints
  • Reasoning failures (reversal curse, multi-hop errors)

Detection Approaches

Factuality Detection

  • External retrieval: Decompose claims into atomic facts, verify against knowledge sources (FACTSCORE)
  • Internal checking: Use LLM’s own knowledge via Chain-of-Verification (CoVe)
  • Uncertainty estimation: Token-level entropy, multi-sample consistency (SelfCheckGPT)

Faithfulness Detection

  • NLI-based entailment checking
  • QA-based consistency verification
  • Prompt-based LLM evaluation

Mitigation Strategies

Data-related:

  • Data filtering and curation
  • Model editing for specific knowledge updates
  • Retrieval-Augmented Generation (with caveats)

Training-related:

  • Contrastive learning for better knowledge encoding
  • Alignment data quality control
  • Teaching refusal behaviors

Inference-related:

  • Factuality-enhanced decoding (DoLa, ITI)
  • Context-attention amplification
  • Multi-path verification

Notable Observations

  1. RAG doesn’t solve hallucination. Retrieval can fail; models may ignore retrieved context; irrelevant retrievals can introduce new hallucinations.

  2. Sycophancy is trained-in. RLHF creates models that prefer pleasing evaluators over truthfulness. Both humans and reward models show bias toward sycophantic responses.

  3. Knowledge boundary mismatch. SFT on data requiring facts beyond the model’s pre-training knowledge encourages hallucination, the model learns to fabricate rather than refuse.

  4. Detection-mitigation coupling. The paper’s structure explicitly ties each cause to its mitigation, which is more actionable than treating them as separate problems.

Extracted Content

Atoms

Molecules