RAG Architecture Taxonomy

Overview

A framework for categorizing RAG systems by where architectural innovation occurs. This taxonomy helps identify the design space and understand tradeoffs between different approaches.

Components

1. Retriever-Centric Systems

Innovation happens before generation. The retriever bears responsibility for quality.

Sub-patterns:

  • Query-driven: Refine queries before retrieval (decomposition, rewriting, reformulation)
  • Retriever-centric adaptation: Modify the retriever itself through learning or architecture changes
  • Granularity-aware: Optimize the unit of retrieval (documents vs. passages vs. sentences)

Tradeoff: Preserves modularity and interpretability, but introduces latency and sensitivity to ambiguous queries.

2. Generator-Centric Systems

Innovation happens during generation. The generator compensates for retrieval imperfections.

Sub-patterns:

  • Faithfulness-aware decoding: Self-reflection, verification, or correction during generation
  • Context compression: Optimize retrieved content into denser representations
  • Retrieval-guided generation: Modulate generation based on retrieval metadata

Tradeoff: Can recover from suboptimal retrieval, but requires more sophisticated generation architectures.

3. Hybrid Systems

Innovation spans both retriever and generator through tight coupling.

Sub-patterns:

  • Iterative/multi-round retrieval: Interleave retrieval and generation across reasoning steps
  • Utility-driven optimization: Align retriever outputs with generation objectives end-to-end
  • Dynamic retrieval triggering: Decide when to retrieve based on model uncertainty

Tradeoff: Most powerful but hardest to train, debug, and deploy. Coordination complexity.

4. Robustness-Oriented Systems

Innovation targets failure modes under adversarial or degraded conditions.

Sub-patterns:

  • Noise-adaptive training: Expose models to perturbed, irrelevant, or misleading contexts
  • Hallucination-aware constraints: Enforce grounding during decoding
  • Adversarial defenses: Protect against corpus poisoning and semantic backdoors

Tradeoff: Essential for production but adds training and inference overhead.

When to Use

  • Diagnosing existing systems: Map current architecture to understand where improvements might have most impact
  • Designing new systems: Choose approach based on primary constraints (latency vs. accuracy vs. robustness)
  • Understanding literature: Quickly categorize new papers and their contributions

Limitations

  • Many real systems are hybrids that don’t fit cleanly into one category
  • The taxonomy emphasizes architecture over data quality, which is often more important in practice
  • Doesn’t capture deployment concerns like cost, scaling, or maintenance burden

Related: 05-atom—rag-core-equation, 05-molecule—rag-evaluation-dimensions, 05-atom—rag-seven-failure-points