RAG Evaluation Survey (Gan et al. 2025)

Comprehensive survey of evaluation methods for Retrieval-Augmented Generation systems, bridging traditional IR/NLG metrics with emerging LLM-based evaluation approaches.

Core Framing

The authors position RAG evaluation as uniquely challenging because it sits at the intersection of two established fields, information retrieval and natural language generation, while introducing novel complexities from the LLM era. They structure evaluation along two axes:

Internal Evaluation: Component-level performance and methodology-specific metrics
External Evaluation: System-wide factors like safety and efficiency

This framing is transferable to any hybrid AI system where multiple components interact.

Key Contributions

Systematic taxonomy of evaluation targets using pairwise relationships
Catalog of both conventional metrics and LLM-based evaluation methods
Safety evaluation framework addressing RAG-specific attack vectors
Meta-analysis of evaluation practices in high-impact RAG research

Extracted Content

→ 05-atom—internal-external-evaluation-distinction → 05-atom—faithfulness-correctness-distinction → 05-atom—rag-specific-attack-vectors → 05-atom—llm-as-evaluator-paradigm → 05-molecule—rag-evaluation-targets-framework → 05-molecule—rag-safety-evaluation-taxonomy

>heyMHK

Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey

RAG Evaluation Survey (Gan et al. 2025)

Core Framing

Key Contributions

Extracted Content

Properties

Graph view

Table of Contents