Self-Attention

An attention mechanism where a sequence relates to itself to compute a representation. Each position in a sequence can attend to all other positions in that same sequence.

In practice: every token “looks at” every other token to decide how much each should influence its own representation. A word’s meaning in context emerges from its weighted relationships to all other words.

This is sometimes called intra-attention to distinguish it from attention between different sequences (like between an input and output in translation).

>heyMHK

Self-Attention

Self-Attention

Properties

Graph view

Backlinks