Short-Term Memory

Master how AI agents manage conversation context and working memory

Managing Limited Memory

Since context windows are finite, agents need strategies to manage what stays in memory and what gets dropped. The right approach depends on your use case: chat apps, document Q&A, or long-running agents.

Let's explore the four main strategies and when to use each one.

Interactive: Strategy Comparison

πŸ”„ Sliding Window

Drop the oldest messages when the window fills up. Simple FIFO (first-in, first-out) queue.

βœ… Pros
  • β€’ Simple to implement
  • β€’ Low overhead
  • β€’ Predictable behavior
❌ Cons
  • β€’ Loses early context
  • β€’ No prioritization
  • β€’ Poor for long sessions
Best For:

Short chat sessions, customer support, simple Q&A where recent context is most important.

messages = messages[-max_messages:] # Keep last N messages

Interactive: Compression Simulator

See how summarization reduces token usage. Higher compression = more aggressive summarization.

4:1
2:1 (Mild)5:1 (Moderate)10:1 (Aggressive)

Before Compression

4,096 tokens

After Compression

1024 tokens

3072
Tokens Saved
75%
Reduction
$0.031
Cost Saved

Implementation Tips

πŸ’‘ Always Keep System Prompt

The system prompt should never be dropped or summarizedβ€”it defines the agent's behavior and must stay intact.

πŸ’‘ Preserve Recent Messages

Always keep the last 3-5 messages in full. Users expect agents to remember what was just said.

πŸ’‘ Monitor Token Usage

Track token counts per request. Set alerts when approaching context limits to trigger compression proactively.

πŸ’‘ Test Strategy Changes

A/B test different strategies with real users. What works theoretically may not align with user expectations.

←Previous