Self-Consistency Shows Limited Effectiveness
Despite its popularity in the prompting literature, Self-Consistency showed limited effectiveness in benchmarking against other techniques.
Self-Consistency works by generating multiple reasoning paths for the same query, then selecting the most common answer through voting. The intuition: correct answers should emerge more consistently than incorrect ones across multiple attempts.
In practice, the overhead of multiple inference calls often doesn’t justify the marginal improvement, especially on tasks where the model already performs reasonably well. The technique adds computational cost without proportionate accuracy gains.
This doesn’t mean Self-Consistency is useless, it may still help on specific task types or with particular models. But the benchmarking suggests it’s not the universal improvement its popularity implies.
The gap between technique popularity and measured effectiveness is a recurring pattern in the prompting space. Intuitive appeal doesn’t guarantee empirical value.
Related:, 05-atom—few-shot-cot-superiority, 05-molecule—prompting-technique-taxonomy