Faithfulness Hallucination

Faithfulness hallucination occurs when an LLM’s output diverges from the user’s input or contradicts itself internally, regardless of whether the output is factually accurate about the world.

Three subtypes:

Instruction inconsistency: The model fails to follow what the user actually asked for. A user requests translation; the model answers the question instead.

Context inconsistency: The model contradicts information explicitly provided in the prompt or retrieved context. Critical for RAG systems, the model may have access to correct information and still generate something different.

Logical inconsistency: The model’s reasoning chain contains internal contradictions. The intermediate steps may be correct but the conclusion doesn’t follow, or consecutive steps contradict each other.

This is distinct from factuality hallucination: a response can be faithful to instructions and context while being factually wrong, or factually correct while being unfaithful to what the user actually requested.

>heyMHK

Faithfulness Hallucination

Faithfulness Hallucination

Properties

Graph view

Backlinks