TRACe Framework Decomposition

RAG system evaluation decomposes into four dimensions that separate retriever quality from generator quality:

Retriever metric:

  • Relevance: Did the retrieved context contain information pertinent to the query?

Generator metrics:

  • uTilization: Did the generator actually use the relevant retrieved information?
  • Adherence: Does the response stay grounded in context without introducing unsupported claims?
  • Completeness: Does the response fully address all parts of the question?

The key insight: a poor response can stem from retrieval failure (wrong documents) or generation failure (right documents, wrong synthesis). You can’t fix what you can’t locate. TRACe enables targeted debugging by attributing failures to specific pipeline components.

Related: [None yet]