Context Windows and Memory

Engine Room Article 5: What Models Actually ‘Remember’


How Context Works

A context window is the text the model can process when generating a response. It includes system prompts, conversation history, any documents you’ve included - everything the model can “see” for this particular response.

The key insight: models are stateless. Each response is generated fresh from whatever’s in the current context window. Previous conversations aren’t remembered unless explicitly included.

Models are stateless. Each response is generated fresh from the current context. Previous conversations aren’t remembered unless explicitly included.

Bigger Windows, Different Tradeoffs

Context windows have grown substantially - from a few thousand tokens to 100K+ in some models. This sounds like pure improvement, but there are considerations.

Longer contexts are slower and more expensive to process. More importantly, research shows models don’t use long contexts uniformly - information in the middle tends to get less attention than content at the beginning and end.

RAG and Context Management

Retrieval-Augmented Generation uses context windows differently: instead of including everything, you retrieve relevant chunks and include only those.

Thoughtful selection of what to include often beats dumping everything into a large context window.

Practical Implications

Be explicit about what’s needed. Don’t assume the model remembers anything from outside the current context.

Position matters. Put the most important information at the beginning of your context.

More isn’t always better. Selective, relevant context often outperforms comprehensive context.


Context is working memory, not long-term memory. Design systems that explicitly manage what the model sees for each response.

Related: 07-source—engine-room-series