Query-Key-Value Framework
The computational structure underlying attention mechanisms. Three vectors are derived from each input:
Query (Q): What am I looking for? Key (K): What do I contain that might be relevant? Value (V): What information do I actually contribute?
The attention computation: compare each query against all keys to get relevance scores, then use those scores to weight the values into an output.
Concretely: Attention(Q, K, V) = softmax(QKᵀ / √dₖ) × V
The division by √dₖ (the key dimension) prevents the dot products from growing too large, which would push the softmax into regions with tiny gradients.
The metaphor that helps: queries are questions, keys are labels on filing cabinets, values are the contents inside. You match your question to the labels, then retrieve a weighted blend of contents.
Related: 05-atom—self-attention-definition, 05-molecule—attention-mechanism-concept