Context Window Limitations

The context window is the maximum amount of text a model can process in a single interaction. It’s a hard constraint on what the model can “see” at once.

Why It Exists

Attention is quadratic in sequence length. Doubling the context window roughly quadruples the computational cost. Context windows have grown from 2K tokens (early GPT) to 100K+ tokens, but remain finite.

What It Means

No True Memory: Content outside the context window doesn’t exist to the model. Every new conversation starts fresh unless context is explicitly provided.

Position Effects: Information at the beginning and end of context is better retained than information in the middle (“lost in the middle” phenomenon).

RAG as Workaround: Retrieval-augmented generation works around context limits by selectively inserting relevant information, but introduces its own challenges (retrieval quality, context assembly).

Practical Implications

Long documents must be chunked. Multi-turn conversations must be summarized or truncated. Knowledge cutoffs exist because training data can’t fit in context. These aren’t bugs to be fixed, they’re architectural constraints.

Related: 05-molecule—attention-mechanism-concept, 07-molecule—rag-core-tradeoffs