Query-Key-Value Framework

The computational structure underlying attention mechanisms. Three vectors are derived from each input:

Query (Q): What am I looking for? Key (K): What do I contain that might be relevant? Value (V): What information do I actually contribute?

The attention computation: compare each query against all keys to get relevance scores, then use those scores to weight the values into an output.

Concretely: Attention(Q, K, V) = softmax(QKᵀ / √dₖ) × V

The division by √dₖ (the key dimension) prevents the dot products from growing too large, which would push the softmax into regions with tiny gradients.

The metaphor that helps: queries are questions, keys are labels on filing cabinets, values are the contents inside. You match your question to the labels, then retrieve a weighted blend of contents.

Related: 05-atom—self-attention-definition, 05-molecule—attention-mechanism-concept