The Retrieval-Generation Cascade Problem
In RAG systems, retrieval quality and generation quality are interdependent but not correlated in straightforward ways.
Good retrieval doesn’t guarantee good generation. A retriever might return highly relevant documents that contain conflicting information, overwhelming the generator’s reasoning capacity.
Poor retrieval can be partially compensated by strong generation. A capable LLM might extract value from marginally relevant documents or fall back on parametric knowledge when retrieval fails.
The cascade creates evaluation challenges: end-to-end metrics conflate component failures. When the system gives a wrong answer, was it a retrieval miss, a generation hallucination, or a failure to integrate multiple sources?
This argues for multi-stage evaluation that isolates component contributions, but isolation itself is artificial, since real performance emerges from interaction.