The Direct Likert Regression Problem

When asked to provide numerical Likert ratings directly, LLMs regress to “safe” center values, typically 3 on a 5-point scale, producing narrow, unrealistic distributions.

This isn’t a fundamental limitation of the models. It’s an artifact of the elicitation method.

The pattern appears across models (GPT-4o, Gemini) and persists regardless of temperature settings. Models almost never respond with 1 or 5, even when human distributions show strong preferences toward scale extremes.

Attempts to nudge models toward extremes via prompt modification create over-correction: better distributions but worse correlation with human rankings. The constraint itself creates the problem.

Related: 06-atom—semantic-similarity-rating, 07-molecule—elicitation-design-principle, 05-atom—uniform-confidence-problem