Self-Attention

An attention mechanism where a sequence relates to itself to compute a representation. Each position in a sequence can attend to all other positions in that same sequence.

In practice: every token “looks at” every other token to decide how much each should influence its own representation. A word’s meaning in context emerges from its weighted relationships to all other words.

This is sometimes called intra-attention to distinguish it from attention between different sequences (like between an input and output in translation).

Related: 05-atom—query-key-value-framework, 05-molecule—attention-mechanism-concept