Learning From Wrong Answers Improves Reasoning

The Principle

Showing models both correct and incorrect reasoning examples produces better results than correct examples alone.

Why This Matters

Traditional few-shot prompting shows only positive examples, here’s how to solve this correctly. Contrastive Chain-of-Thought (CCoT) adds negative examples: here’s a plausible-looking wrong answer and why it fails.

This mirrors how humans learn complex reasoning. We don’t just see solutions, we see common mistakes and understand why they’re wrong. The contrast clarifies the boundary between right and wrong approaches.

How to Apply

For each reasoning example in your prompt:

Show the correct reasoning chain and answer
Show a plausible incorrect approach
Explain why the incorrect approach fails

The incorrect example should be genuinely tempting, a mistake a reasonable person might make.

When This Especially Matters

Strategic reasoning where multiple approaches seem viable
Mathematical problems with common misconceptions
Logical puzzles with attractive-but-wrong paths

Results

CCoT improves over standard CoT by 4-16% on reasoning benchmarks. Combined with self-consistency, gains reach ~5% additional improvement.

Limitations

Requires knowing what wrong answers look like (domain expertise needed)
Doubles or triples example length
Automated generation of good contrasting examples is hard

The Deeper Insight

Models, like humans, benefit from error awareness. Knowing what’s wrong sharpens the definition of what’s right.

>heyMHK

Learning From Wrong Answers Improves Reasoning

Learning From Wrong Answers Improves Reasoning

The Principle

Why This Matters

How to Apply

When This Especially Matters

Results

Limitations

The Deeper Insight

Properties

Graph view

Table of Contents