Internal vs External Evaluation
When evaluating complex AI systems, two fundamentally different perspectives apply:
Internal evaluation examines component-level performance and the interactions between subsystems. For RAG, this means assessing retrieval quality and generation quality as distinct but interdependent concerns.
External evaluation examines system-wide factors that matter in deployment, safety, efficiency, and practical viability. These concerns transcend any single component.
The distinction matters because optimizing one component in isolation can degrade the whole. A retriever that returns highly relevant but redundant documents may score well on internal metrics while actively harming generation quality.
This framing generalizes beyond RAG to any hybrid system where multiple AI capabilities combine.
Related: 07-molecule—ui-as-ultimate-guardrail