Learning From Wrong Answers Improves Reasoning

The Principle

Showing models both correct and incorrect reasoning examples produces better results than correct examples alone.

Why This Matters

Traditional few-shot prompting shows only positive examples, here’s how to solve this correctly. Contrastive Chain-of-Thought (CCoT) adds negative examples: here’s a plausible-looking wrong answer and why it fails.

This mirrors how humans learn complex reasoning. We don’t just see solutions, we see common mistakes and understand why they’re wrong. The contrast clarifies the boundary between right and wrong approaches.

How to Apply

For each reasoning example in your prompt:

  1. Show the correct reasoning chain and answer
  2. Show a plausible incorrect approach
  3. Explain why the incorrect approach fails

The incorrect example should be genuinely tempting, a mistake a reasonable person might make.

When This Especially Matters

  • Strategic reasoning where multiple approaches seem viable
  • Mathematical problems with common misconceptions
  • Logical puzzles with attractive-but-wrong paths

Results

CCoT improves over standard CoT by 4-16% on reasoning benchmarks. Combined with self-consistency, gains reach ~5% additional improvement.

Limitations

  • Requires knowing what wrong answers look like (domain expertise needed)
  • Doubles or triples example length
  • Automated generation of good contrasting examples is hard

The Deeper Insight

Models, like humans, benefit from error awareness. Knowing what’s wrong sharpens the definition of what’s right.

Related: 05-atom—few-shot-cot-superiority, 05-molecule—chain-of-thought-prompting