Short-Term Memory
Master how AI agents manage conversation context and working memory
Your Progress
0 / 5 completedUnderstanding Context Windows
A context window is the amount of text (measured in tokens) that an LLM can process in a single forward pass. It acts as the agent's immediate working memory—everything the model needs to consider when generating its next response.
Think of it like RAM in a computer: larger context windows allow agents to hold more information simultaneously, but come at the cost of slower processing and higher computational expense.
Interactive: Window Size Simulator
✓ Standard Context
Good for most chat applications. Can handle detailed multi-turn conversations.
Interactive: Message Buffer Visualization
How Models Handle Context Limits
🔄 Sliding Window
Drop oldest messages first (FIFO). Simple but loses early context. Used by most chat apps.
📝 Summarization
Compress old messages into summaries. Retains key info but loses details. Better for long sessions.
⭐ Importance Filtering
Keep messages with high relevance scores. Requires extra computation but preserves critical context.
🔀 Hybrid Approach
Combine strategies: summarize middle, keep recent and important. Best results but most complex.
Key Insights
- • Larger windows ≠ better: They're slower, more expensive, and can dilute attention
- • Hard limit: Unlike attention, context windows are a strict boundary—exceeding them means data loss
- • Planning matters: Design conversations to fit within limits (chunking, summaries, retrieval)
- • Cost scales linearly: 2x the tokens = 2x the cost per request