Retrieval-Augmented Generation (RAG)
RAG is an AI architecture that combines a neural text retriever with a text generator to improve the quality of generated responses in knowledge-intensive tasks.
Given an input query, RAG retrieves relevant passages from a large text corpus via a learned dense index, then conditions a sequence-to-sequence model on both the query and the retrieved documents. The model marginalizes over multiple retrieved passages to produce its output.
Formally, RAG treats retrieved documents as latent variables:
P(y|x) = Σ P_ret(z_i|x) × P_gen(y|x, z_i)
Where P_ret is the retriever’s distribution over documents and P_gen is the generator’s conditional probability.
This creates a system with two kinds of memory: parametric memory (knowledge encoded in model weights) and non-parametric memory (external corpus accessed via retrieval).