Different LLMs Encode Different Subjective Biases

Given identical prompts and data, different LLMs produce systematically different annotations. The disagreement is not random noise, it reflects different biases encoded during training.

Pairwise intercoder reliability among 15 LLMs ranged from 0.16 to 0.69, with larger models agreeing more with each other than with smaller models. This clustering suggests model families may share biases.

The implications extend beyond annotation. Any task involving subjective judgment, summarization, classification, content moderation, sentiment analysis, will reflect the specific biases of the model used. Switching models can change outputs in ways that appear consistent but actually reflect different underlying assumptions.

This is neither surprising nor unique to LLMs, human annotators also encode biases. But human biases are better studied, more predictable, and often correctable through training. LLM biases remain largely opaque.

Related: [None yet]