Context Window Limitations
The context window is the maximum amount of text a model can process in a single interaction. It’s a hard constraint on what the model can “see” at once.
Why It Exists
Attention is quadratic in sequence length. Doubling the context window roughly quadruples the computational cost. Context windows have grown from 2K tokens (early GPT) to 100K+ tokens, but remain finite.
What It Means
No True Memory: Content outside the context window doesn’t exist to the model. Every new conversation starts fresh unless context is explicitly provided.
Position Effects: Information at the beginning and end of context is better retained than information in the middle (“lost in the middle” phenomenon).
RAG as Workaround: Retrieval-augmented generation works around context limits by selectively inserting relevant information, but introduces its own challenges (retrieval quality, context assembly).
Practical Implications
Long documents must be chunked. Multi-turn conversations must be summarized or truncated. Knowledge cutoffs exist because training data can’t fit in context. These aren’t bugs to be fixed, they’re architectural constraints.
Related: 05-molecule—attention-mechanism-concept, 07-molecule—rag-core-tradeoffs