Self-Consistency Through Diverse Sampling

Definition

A prompting technique that generates multiple reasoning paths for the same question, then selects the most consistent answer by majority vote. Trades compute for reliability.

The Mechanism

  1. Sample: Generate N different reasoning chains (using temperature > 0)
  2. Extract: Parse final answer from each chain
  3. Aggregate: Select answer appearing most frequently

Why It Works

Different reasoning paths may make different intermediate errors, but converge on correct final answers more often than incorrect ones. The approach is robust to individual chain failures.

Key Properties

  • Model-agnostic: Works with any chain-of-thought capable model
  • No training required: Pure inference-time technique
  • Compute-intensive: Requires N forward passes per query
  • Works best when: Multiple valid reasoning paths exist

Limitations

  • Expensive at scale (N typically 5-40)
  • Assumes answer distribution is meaningful
  • Doesn’t work when all paths share same systematic error
  • May mask uncertainty rather than surfacing it

Practical Application

Best for high-stakes decisions where accuracy matters more than latency or cost. Common in evaluation benchmarks where reliability is paramount.

Related: 05-molecule—chain-of-thought-prompting, 00-source—schulhoff-2024-prompt-report, 05-atom—uniform-confidence-problem