Short-Term Memory

Master how AI agents manage immediate information through context windows and attention mechanisms

The Immediate Memory Challenge

Imagine trying to hold a conversation while only remembering the last 7 things anyone said. That's essentially what short-term memory is—a temporary workspace where information is held "in mind" just long enough to use it.

For AI agents, short-term memory is implemented through context windows—the maximum amount of text (measured in tokens) that an agent can "see" at once. Everything outside this window is effectively forgotten.

Understanding short-term memory is crucial because it determines how much information an agent can process simultaneously and how well it can maintain conversational coherence.

Interactive: Context vs Attention

Context Window: Hard Limit

The context window is a hard boundary on how much text can fit into memory. Think of it as a fixed-size notepad.

Storage TypeSequential text buffer
Limit TypeToken count (hard ceiling)
When ExceededOldest info is dropped
Example4K tokens = ~3,000 words

Key Point: Context windows are like a rolling conveyor belt—new information pushes out old information when the limit is reached.

Interactive: Token Limit Explorer

4,096 tokens
1K32K64K128K+

Standard

Most conversations

Approximate Words
3,072
Message Capacity
~27 messages
Use Case
Chat Apps

Why Short-Term Memory Matters

✅ Enables

  • • Conversational coherence
  • • Multi-turn interactions
  • • Context-aware responses
  • • Real-time adaptation

⚠️ Limits

  • • How long conversations can last
  • • Amount of information per turn
  • • Ability to reference old messages
  • • Cost per interaction