Cascaded Retrieval Pattern

Context

Retrieval systems face a fundamental tradeoff: broad recall (finding everything relevant) versus precision (finding only what’s relevant). Optimizing for both simultaneously is computationally expensive at scale.

Problem

You need high-quality retrieval results, but can’t afford to run expensive ranking operations over the entire corpus for every query.

Solution

Structure retrieval as a cascade: fast, high-recall first stage followed by slower, precision-oriented re-ranking on a reduced candidate set.

Stage 1: Recall-Oriented

  • Use cheap, fast methods (graph traversal, approximate nearest neighbor)
  • Accept false positives (better to include irrelevant items than miss relevant ones
  • Output: large candidate set (hundreds to thousands of items)

Stage 2: Precision-Oriented

  • Apply expensive operations only to the candidate set
  • Dense similarity scoring, neural re-ranking, or hybrid fusion
  • Output: ranked list for consumption

Fusion Methods

  • Reciprocal Rank Fusion (RRF) combines multiple ranking signals
  • Maintains separate scores from each stage, then merges
  • No learned parameters (works out of the box

Consequences

Benefits:

  • Scales to large corpora without proportional cost increase
  • Combines strengths of multiple retrieval approaches
  • Graceful degradation (if stage 1 returns good candidates, stage 2 just needs to rank

Costs:

  • Recall ceiling set by stage 1 (expensive stage 2 can’t recover missed items
  • Requires tuning candidate set size (too small = missed relevance, too large = defeated purpose)
  • Multiple moving parts to maintain

When This Applies

Any retrieval system where corpus size makes exhaustive ranking impractical. The pattern appears in web search, recommendation systems, and now RAG architectures.

Related: 06-atom—multi-granular-embeddings, 07-molecule—vectors-vs-graphs