Dynamic Retrieval Triggering
Context
Most RAG systems retrieve on every query, regardless of whether retrieval is needed. This wastes compute, adds latency, and can introduce noise that degrades output quality. Static retrieval policies don’t match the variable information needs of different queries.
Problem
How do you decide when to retrieve external evidence versus relying on the model’s parametric knowledge?
Solution
Use model uncertainty signals to trigger retrieval dynamically. Retrieve only when the model indicates it lacks confidence in generating an answer from its existing knowledge.
Implementation approaches:
-
Entropy-based triggers (DRAGIN) — Monitor token-level entropy during generation. High entropy indicates knowledge gaps. Trigger retrieval and reformulate query based on self-attention patterns.
-
Confidence thresholds (TA-ARE) — Train a classifier to predict when retrieval will improve output quality. Replace static thresholds with learned estimators.
-
Proactive anticipation (FLARE) — Predict knowledge needs before uncertainty arises by looking ahead in the generation process.
-
Self-routing (SELF-ROUTE) — Let the model assess task difficulty and route accordingly between retrieval and generation paths.
Consequences
Benefits:
- Reduces redundant retrievals (14.9% reduction reported with TA-ARE)
- Improves latency for queries that don’t need retrieval
- Can improve quality by avoiding retrieval noise on well-known topics
Tradeoffs:
- Adds inference complexity
- Risk of under-retrieval (failing to retrieve when it would help)
- Trigger mechanisms require tuning per domain
- Token-level approaches like DRAGIN have high inference cost
When it works well:
- Mixed query loads with varying information needs
- Latency-sensitive applications
- Domains where the model has strong parametric knowledge for some topics
When it struggles:
- Consistently knowledge-intensive domains (just retrieve everything)
- When uncertainty signals don’t correlate with actual knowledge gaps
Related: 05-molecule—rag-architecture-taxonomy, 05-atom—rag-core-equation, 05-atom—retrieval-noise-paradox