Hallucination Causes Lifecycle
Overview
A three-stage causal framework tracing hallucination origins through the LLM development pipeline: Data → Training → Inference. Each stage contributes distinct hallucination mechanisms, and understanding the stage helps target mitigation.
The Framework
Stage 1: Data-Level Causes
Problems with what the model learns from.
Misinformation & Bias
- Imitative falsehood: Model memorizes and reproduces false information from training data
- Societal biases: Gender, nationality, other biases create unfaithful outputs when triggered
Knowledge Boundaries
- Long-tail gaps: Rare information not well-represented in training
- Temporal limits: Knowledge cutoff creates fabrication risk for recent events
- Copyright restrictions: Legal constraints create systematic knowledge gaps
Alignment Data Quality
- New factual knowledge in SFT data exceeds model’s pre-training knowledge
- Complex/diverse instructions increase hallucination rates
- Task-specific formatting instructions may encourage hallucination
Stage 2: Training-Level Causes
Problems with how the model learns.
Pre-training Issues
- Unidirectional attention limits contextual dependency capture
- Soft attention dilution across long sequences
- Exposure bias: Training-inference gap causes error cascades
SFT Issues
- Capability-boundary mismatch: Model trained to answer beyond its knowledge
- No refusal training: Models don’t learn to say “I don’t know”
- Overfitting to formatting over factuality
RLHF Issues
- Sycophancy: Reward signal favors pleasing over truthful
- Both humans and reward models prefer confident, agreeable responses
- Internal belief can diverge from output behavior
Stage 3: Inference-Level Causes
Problems with how the model generates.
Decoding Strategy
- Randomness enables creativity but increases hallucination risk
- Higher temperature → more tail sampling → more hallucination
- Likelihood trap makes deterministic decoding produce low-quality text
Attention Drift
- Over-confidence in local context, under-attention to instructions
- Long outputs prone to “instruction forgetting”
- Context window position effects (lost in the middle)
Architectural Constraints
- Softmax bottleneck limits output distribution expressiveness
- Multi-modal distributions can’t be accurately represented
Reasoning Failures
- Reversal curse: “A is B” doesn’t guarantee “B is A”
- Multi-hop errors: Each reasoning step can compound
- Logical consistency breaks down over complex chains
How to Apply
Diagnosis: When a hallucination occurs, trace likely cause:
- Is the false information prevalent online? → Data (imitative falsehood)
- Is it a domain the model has gaps in? → Data (knowledge boundary)
- Did the model ignore clear context? → Training (SFT) or Inference (attention)
- Did it agree with a wrong premise? → Training (sycophancy)
- Did reasoning break down mid-chain? → Inference (reasoning failure)
Mitigation Selection: Match intervention to stage:
- Data causes → Data filtering, RAG, model editing
- Training causes → Alignment improvements, refusal training, honesty objectives
- Inference causes → Decoding modifications, attention manipulation, verification loops
System Design: Different deployment contexts have different stage exposures:
- RAG systems: Data causes partially addressed, inference causes amplified
- Fine-tuned models: Training causes become dominant
- Prompt-engineered systems: Inference causes most relevant
Limitations
Causes often interact:
- Data gaps + SFT mismatch = confident fabrication
- Sycophancy + attention drift = agreeing with misread context
- Imitative falsehood + reasoning = elaborate wrong explanations
The lifecycle is a diagnostic aid, not a deterministic trace.
Related: 05-molecule—llm-hallucination-taxonomy, 05-molecule—capability-alignment-gap, 05-atom—rag-paradox-question