When Does Retrieval Hurt More Than It Helps?

Retrieval isn’t universally beneficial. Under what conditions does adding retrieval actually degrade system performance compared to parametric-only generation?

Observed conditions where retrieval hurts:

Query is well within model’s parametric knowledge: retrieval adds noise without information gain
Retrieved documents are semantically adjacent but factually misleading: “soft noise” that the model trusts incorrectly
Latency budget is tight: retrieval overhead isn’t worth marginal quality improvement
Model is better at synthesis than selection: giving it more context overwhelms its filtering capability

This question matters for deciding when to deploy RAG versus simpler approaches. The pattern to watch: retrieval helps most when the model genuinely lacks knowledge, and hurts most when retrieval quality is low and the model can’t compensate.

>heyMHK

When Does Retrieval Hurt More Than It Helps?

When Does Retrieval Hurt More Than It Helps?

Properties

Graph view