Attention as Selective Focus
In neural networks, attention mechanisms enable models to selectively focus on relevant parts of their input rather than processing everything uniformly. This concept mirrors human cognitive attention, the ability to concentrate on specific information while filtering out noise.
Core Principle
Attention is fundamentally a weighting mechanism, not understanding. The model learns which parts of the input sequence deserve more computational focus when producing each output. This is formalized through attention scores that determine how much each input element contributes to each output element.
Key Insight
The revolutionary aspect of attention is that it allows models to handle variable-length sequences without the bottleneck of compressing everything into a fixed-size representation. Unlike recurrent architectures that process sequentially, attention can directly connect any two positions in a sequence.
Related Concepts
The attention mechanism manifests in several forms:
- [05-atom—multi-head-attention-definition] enables parallel attention patterns
- [05-atom—attention-heads-specialization] shows different heads learn distinct linguistic features
- [05-atom—positional-encoding-definition] provides sequence order information that attention alone cannot capture
- [05-atom—attention-vs-understanding-distinction] clarifies what attention actually computes
For architectural context, see [05-molecule—attention-mechanism-concept] and [05-molecule—attention-vs-recurrence-comparison].
The Engine Room series provides deeper technical explanation in [05-organism—engine-room-02-attention].