Cascaded Retrieval Pattern

Context

Retrieval systems face a fundamental tradeoff: broad recall (finding everything relevant) versus precision (finding only what’s relevant). Optimizing for both simultaneously is computationally expensive at scale.

Problem

You need high-quality retrieval results, but can’t afford to run expensive ranking operations over the entire corpus for every query.

Solution

Structure retrieval as a cascade: fast, high-recall first stage followed by slower, precision-oriented re-ranking on a reduced candidate set.

Stage 1: Recall-Oriented

Use cheap, fast methods (graph traversal, approximate nearest neighbor)
Accept false positives (better to include irrelevant items than miss relevant ones
Output: large candidate set (hundreds to thousands of items)

Stage 2: Precision-Oriented

Apply expensive operations only to the candidate set
Dense similarity scoring, neural re-ranking, or hybrid fusion
Output: ranked list for consumption

Fusion Methods

Reciprocal Rank Fusion (RRF) combines multiple ranking signals
Maintains separate scores from each stage, then merges
No learned parameters (works out of the box

Consequences

Benefits:

Scales to large corpora without proportional cost increase
Combines strengths of multiple retrieval approaches
Graceful degradation (if stage 1 returns good candidates, stage 2 just needs to rank

Costs:

Recall ceiling set by stage 1 (expensive stage 2 can’t recover missed items
Requires tuning candidate set size (too small = missed relevance, too large = defeated purpose)
Multiple moving parts to maintain

When This Applies

Any retrieval system where corpus size makes exhaustive ranking impractical. The pattern appears in web search, recommendation systems, and now RAG architectures.

>heyMHK

Cascaded Retrieval Pattern

Cascaded Retrieval Pattern

Context

Problem

Solution

Consequences

When This Applies

Properties

Graph view

Table of Contents

Backlinks