RAG Evaluation Targets Framework

Overview

A structured approach to RAG evaluation built on pairwise relationships between system components and their outputs. The framework distinguishes six evaluation targets across retrieval and generation.

Retrieval Targets

Three relationships define what “good retrieval” means:

Target	Relationship	What It Measures
Relevance	Documents ↔ Query	Do retrieved docs match the information need?
Comprehensiveness	Documents ↔ Documents	Do retrieved docs provide diverse, complete coverage?
Correctness	Documents ↔ Candidates	Are the right docs ranked above wrong docs?

Generation Targets

Three parallel relationships define “good generation”:

Target	Relationship	What It Measures
Relevance	Response ↔ Query	Does the response address what was asked?
Faithfulness	Response ↔ Documents	Does the response accurately reflect retrieved content?
Correctness	Response ↔ Ground Truth	Is the response factually accurate?

Application

The framework is diagnostic: when a system underperforms, the relationships pinpoint where.

Low retrieval relevance → query understanding or embedding mismatch
Low retrieval comprehensiveness → biased retrieval or insufficient corpus
Low faithfulness but high correctness → model ignoring context, using parametric knowledge
High faithfulness but low correctness → accurate summarization of wrong sources

Limitations

The framework assumes ground truth exists for correctness evaluation, problematic for exploratory or creative tasks where “correct” is undefined.

>heyMHK

RAG Evaluation Targets Framework

RAG Evaluation Targets Framework

Overview

Retrieval Targets

Generation Targets

Application

Limitations

Properties

Graph view

Table of Contents

Backlinks