Researcher Degrees of Freedom in LLM Choice
The many decision points in LLM-based research that affect results but are often unreported. Small methodological choices can significantly impact findings.
Decision Points
Model Selection:
- Which model family?
- Which model size?
- Which checkpoint/version?
- API vs. local deployment?
Prompting:
- System prompt content
- Few-shot examples (which? how many?)
- Output format specification
- Temperature and sampling parameters
Evaluation:
- Which metrics?
- Human vs. automated evaluation
- Sample size and selection
- Statistical tests applied
The Problem
Researchers may (consciously or not) select combinations that produce desired results. Results may not replicate with different reasonable choices.
Mitigation
- Pre-registration of methodology
- Robustness checks across model/prompt variations
- Full reporting of all parameters
- Sensitivity analysis for key decisions
Related: 05-atom—evaluation-metric-limitations, 05-molecule—prepartadigmatic-field-dynamics