RAG Architecture Taxonomy
Overview
A framework for categorizing RAG systems by where architectural innovation occurs. This taxonomy helps identify the design space and understand tradeoffs between different approaches.
Components
1. Retriever-Centric Systems
Innovation happens before generation. The retriever bears responsibility for quality.
Sub-patterns:
- Query-driven: Refine queries before retrieval (decomposition, rewriting, reformulation)
- Retriever-centric adaptation: Modify the retriever itself through learning or architecture changes
- Granularity-aware: Optimize the unit of retrieval (documents vs. passages vs. sentences)
Tradeoff: Preserves modularity and interpretability, but introduces latency and sensitivity to ambiguous queries.
2. Generator-Centric Systems
Innovation happens during generation. The generator compensates for retrieval imperfections.
Sub-patterns:
- Faithfulness-aware decoding: Self-reflection, verification, or correction during generation
- Context compression: Optimize retrieved content into denser representations
- Retrieval-guided generation: Modulate generation based on retrieval metadata
Tradeoff: Can recover from suboptimal retrieval, but requires more sophisticated generation architectures.
3. Hybrid Systems
Innovation spans both retriever and generator through tight coupling.
Sub-patterns:
- Iterative/multi-round retrieval: Interleave retrieval and generation across reasoning steps
- Utility-driven optimization: Align retriever outputs with generation objectives end-to-end
- Dynamic retrieval triggering: Decide when to retrieve based on model uncertainty
Tradeoff: Most powerful but hardest to train, debug, and deploy. Coordination complexity.
4. Robustness-Oriented Systems
Innovation targets failure modes under adversarial or degraded conditions.
Sub-patterns:
- Noise-adaptive training: Expose models to perturbed, irrelevant, or misleading contexts
- Hallucination-aware constraints: Enforce grounding during decoding
- Adversarial defenses: Protect against corpus poisoning and semantic backdoors
Tradeoff: Essential for production but adds training and inference overhead.
When to Use
- Diagnosing existing systems: Map current architecture to understand where improvements might have most impact
- Designing new systems: Choose approach based on primary constraints (latency vs. accuracy vs. robustness)
- Understanding literature: Quickly categorize new papers and their contributions
Limitations
- Many real systems are hybrids that don’t fit cleanly into one category
- The taxonomy emphasizes architecture over data quality, which is often more important in practice
- Doesn’t capture deployment concerns like cost, scaling, or maintenance burden
Related: 05-atom—rag-core-equation, 05-molecule—rag-evaluation-dimensions, 05-atom—rag-seven-failure-points