Few-Shot Learning Plateaus Fast
In-context learning (providing examples in the prompt) improves LLM annotation quality, but the gains are modest and quickly plateau.
Mean Krippendorff’s alpha by example count:
- 0-shot: 0.34
- 2-shot: marginal increase
- 5-shot: 0.38 (peak)
- 10-shot: slight decline
The 10-shot decline likely reflects context length constraints: with long target texts, additional examples crowd out attention to the actual content being annotated.
Meanwhile, more examples directly increase costs:
- Input tokens scale ~8× from 0-shot to 10-shot
- Inference time increases proportionally
- API costs multiply for proprietary models
The cost-benefit math rarely favors heavy few-shot prompting for annotation at scale. Two to five examples captures most of the benefit.
Related: [None yet]