RAG Evaluation Survey (Gan et al. 2025)

Comprehensive survey of evaluation methods for Retrieval-Augmented Generation systems, bridging traditional IR/NLG metrics with emerging LLM-based evaluation approaches.

Core Framing

The authors position RAG evaluation as uniquely challenging because it sits at the intersection of two established fields, information retrieval and natural language generation, while introducing novel complexities from the LLM era. They structure evaluation along two axes:

  1. Internal Evaluation: Component-level performance and methodology-specific metrics
  2. External Evaluation: System-wide factors like safety and efficiency

This framing is transferable to any hybrid AI system where multiple components interact.

Key Contributions

  • Systematic taxonomy of evaluation targets using pairwise relationships
  • Catalog of both conventional metrics and LLM-based evaluation methods
  • Safety evaluation framework addressing RAG-specific attack vectors
  • Meta-analysis of evaluation practices in high-impact RAG research

Extracted Content

05-atom—internal-external-evaluation-distinction05-atom—faithfulness-correctness-distinction05-atom—rag-specific-attack-vectors05-atom—llm-as-evaluator-paradigm05-molecule—rag-evaluation-targets-framework05-molecule—rag-safety-evaluation-taxonomy