RAG Core Equation

Retrieval-Augmented Generation can be expressed mathematically as:

P(y|x) ≈ Σ P(y|x, dᵢ) · P(dᵢ|x)

Where:

x is the input (query or prompt)
dᵢ is a retrieved document from corpus C
y is the generated response

This decomposition reveals that RAG performance depends on two independent probabilities:

P(dᵢ|x): retrieval relevance: how well does the document match the query?
P(y|x, dᵢ): generation quality: how well does the model use the document to answer?

The practical implication: RAG failures can originate in either component. Poor retrieval cascades into poor generation. Excellent retrieval can be wasted by a generator that ignores or misuses the evidence. Diagnosing where things go wrong requires measuring both.

>heyMHK

RAG Core Equation

RAG Core Equation

Properties

Graph view

Backlinks