Retrieval-Augmented Generation (RAG)

RAG is an AI architecture that combines a neural text retriever with a text generator to improve the quality of generated responses in knowledge-intensive tasks.

Given an input query, RAG retrieves relevant passages from a large text corpus via a learned dense index, then conditions a sequence-to-sequence model on both the query and the retrieved documents. The model marginalizes over multiple retrieved passages to produce its output.

Formally, RAG treats retrieved documents as latent variables:

P(y|x) = Σ P_ret(z_i|x) × P_gen(y|x, z_i)

Where P_ret is the retriever’s distribution over documents and P_gen is the generator’s conditional probability.

This creates a system with two kinds of memory: parametric memory (knowledge encoded in model weights) and non-parametric memory (external corpus accessed via retrieval).

>heyMHK

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

Properties

Graph view

Backlinks