The Immediate Memory Challenge

Imagine trying to hold a conversation while only remembering the last 7 things anyone said. That's essentially what short-term memory is—a temporary workspace where information is held "in mind" just long enough to use it.

For AI agents, short-term memory is implemented through context windows—the maximum amount of text (measured in tokens) that an agent can "see" at once. Everything outside this window is effectively forgotten.

Understanding short-term memory is crucial because it determines how much information an agent can process simultaneously and how well it can maintain conversational coherence.

Interactive: Context vs Attention

Context Window: Hard Limit

The context window is a hard boundary on how much text can fit into memory. Think of it as a fixed-size notepad.

Storage TypeSequential text buffer

Limit TypeToken count (hard ceiling)

When ExceededOldest info is dropped

Example4K tokens = ~3,000 words

Key Point: Context windows are like a rolling conveyor belt—new information pushes out old information when the limit is reached.

Interactive: Token Limit Explorer

Context Window Size4,096 tokens

1K32K64K128K+

✓

Standard

Most conversations

Approximate Words

3,072

Message Capacity

~27 messages

Use Case

Chat Apps

Why Short-Term Memory Matters

✅ Enables

• Conversational coherence
• Multi-turn interactions
• Context-aware responses
• Real-time adaptation

⚠️ Limits

• How long conversations can last
• Amount of information per turn
• Ability to reference old messages
• Cost per interaction

Short-Term Memory

Your Progress