Self-Consistency Through Diverse Sampling
Definition
A prompting technique that generates multiple reasoning paths for the same question, then selects the most consistent answer by majority vote. Trades compute for reliability.
The Mechanism
- Sample: Generate N different reasoning chains (using temperature > 0)
- Extract: Parse final answer from each chain
- Aggregate: Select answer appearing most frequently
Why It Works
Different reasoning paths may make different intermediate errors, but converge on correct final answers more often than incorrect ones. The approach is robust to individual chain failures.
Key Properties
- Model-agnostic: Works with any chain-of-thought capable model
- No training required: Pure inference-time technique
- Compute-intensive: Requires N forward passes per query
- Works best when: Multiple valid reasoning paths exist
Limitations
- Expensive at scale (N typically 5-40)
- Assumes answer distribution is meaningful
- Doesn’t work when all paths share same systematic error
- May mask uncertainty rather than surfacing it
Practical Application
Best for high-stakes decisions where accuracy matters more than latency or cost. Common in evaluation benchmarks where reliability is paramount.
Related: 05-molecule—chain-of-thought-prompting, 00-source—schulhoff-2024-prompt-report, 05-atom—uniform-confidence-problem