Exposure Bias

Autoregressive language models are trained on ground-truth sequences but generate based on their own predictions at inference time. This creates a training-inference gap that compounds errors.

During training, each token prediction sees the correct previous tokens (teacher forcing). During inference, each prediction conditions on the model’s own prior outputs, including any errors. An early mistake propagates forward, influencing all subsequent tokens.

The snowball effect: A single hallucinated fact early in generation can cascade. The model must now maintain consistency with its own error, potentially generating additional hallucinations to support the first one. What started as one wrong token becomes an internally-consistent but externally-false narrative.

This is a fundamental limitation of the autoregressive training paradigm, not something that can be fixed by better data or more parameters. The gap between training conditions and inference conditions is structural.

>heyMHK

Exposure Bias

Exposure Bias

Properties

Graph view

Backlinks