Learning From Wrong Answers Improves Reasoning
The Principle
Showing models both correct and incorrect reasoning examples produces better results than correct examples alone.
Why This Matters
Traditional few-shot prompting shows only positive examples, here’s how to solve this correctly. Contrastive Chain-of-Thought (CCoT) adds negative examples: here’s a plausible-looking wrong answer and why it fails.
This mirrors how humans learn complex reasoning. We don’t just see solutions, we see common mistakes and understand why they’re wrong. The contrast clarifies the boundary between right and wrong approaches.
How to Apply
For each reasoning example in your prompt:
- Show the correct reasoning chain and answer
- Show a plausible incorrect approach
- Explain why the incorrect approach fails
The incorrect example should be genuinely tempting, a mistake a reasonable person might make.
When This Especially Matters
- Strategic reasoning where multiple approaches seem viable
- Mathematical problems with common misconceptions
- Logical puzzles with attractive-but-wrong paths
Results
CCoT improves over standard CoT by 4-16% on reasoning benchmarks. Combined with self-consistency, gains reach ~5% additional improvement.
Limitations
- Requires knowing what wrong answers look like (domain expertise needed)
- Doubles or triples example length
- Automated generation of good contrasting examples is hard
The Deeper Insight
Models, like humans, benefit from error awareness. Knowing what’s wrong sharpens the definition of what’s right.
Related: 05-atom—few-shot-cot-superiority, 05-molecule—chain-of-thought-prompting