Few-Shot Learning Plateaus Fast

In-context learning (providing examples in the prompt) improves LLM annotation quality, but the gains are modest and quickly plateau.

Mean Krippendorff’s alpha by example count:

  • 0-shot: 0.34
  • 2-shot: marginal increase
  • 5-shot: 0.38 (peak)
  • 10-shot: slight decline

The 10-shot decline likely reflects context length constraints: with long target texts, additional examples crowd out attention to the actual content being annotated.

Meanwhile, more examples directly increase costs:

  • Input tokens scale ~8× from 0-shot to 10-shot
  • Inference time increases proportionally
  • API costs multiply for proprietary models

The cost-benefit math rarely favors heavy few-shot prompting for annotation at scale. Two to five examples captures most of the benefit.

Related: [None yet]