LLM Choice Changes Research Conclusions 37% of the Time

In a reanalysis of 14 published political science studies, coefficient estimates derived from LLM annotations diverged from original conclusions in 37% of cases (different sign or significance).

The standard deviation of LLM-derived estimates was 1.9× the mean estimate across specifications.

In some cases, estimates ranged from strongly negative and significant to strongly positive and significant depending solely on which LLM performed the annotation, using identical prompts and data.

This variability is not random noise. Different LLMs encode different subjective biases in how they interpret annotation instructions.

Related: [None yet]