Memory Retrieval Strategies
Master how AI agents retrieve relevant memories to support intelligent decision-making and personalized responses
Your Progress
0 / 5 completedMulti-Factor Ranking
After retrieving candidate memories, agents must rank them to determine which ones to include in the context window. Effective ranking combines multiple factors: semantic relevance, recency, and importance.
Interactive: Adjust Ranking Weights
User query: "What ML framework should I use?"
50%
30%
20%
Ranked results (highest score first):
1
Discussed TensorFlow vs PyTorch yesterday
0.85
R: 0.88T: 0.90I: 0.70
2
Working on computer vision project
0.84
R: 0.85T: 0.80I: 0.85
3
User prefers Python for ML projects
0.71
R: 0.95T: 0.20I: 0.90
4
User is senior ML engineer
0.59
R: 0.72T: 0.10I: 1.00
Experiment tip: Try maximizing recency weight to see how recent conversations dominate. Then maximize relevance to prioritize semantic similarity. Balance is key!
⚖️ Key Scoring Factors
🎯
Semantic Relevance
Cosine similarity between query and memory embeddings. Captures semantic meaning.
Formula: cos(query_vec, memory_vec)
⏰
Temporal Recency
Exponential decay based on time since memory creation. Recent = more relevant.
Formula: e^(-λ × age_in_hours)
⭐
Memory Importance
User-defined or model-assigned importance score. Core facts rank higher.
Scale: 0.0 (trivial) to 1.0 (critical)
🧮 Combined Ranking Formula
Final Score =
(α × relevance) + (β × recency) + (γ × importance)
α (alpha)
Relevance weight, typically 0.5-0.7
β (beta)
Recency weight, typically 0.2-0.3
γ (gamma)
Importance weight, typically 0.1-0.3
Weights must sum to 1.0. Adjust based on your application needs.
🚀 Advanced Scoring Techniques
•
Personalization: Learn user-specific weights over time. If a user frequently references old facts, increase importance weight.
•
Context-Aware Scoring: Adjust weights based on query type. Factual queries prioritize importance, conversations prioritize recency.
•
Diversity Penalty: Reduce scores of memories too similar to already-selected ones to avoid redundancy (MMR - Maximal Marginal Relevance).
•
Cross-Encoder Reranking: Use heavy transformer model to rerank top candidates with query-memory cross-attention for maximum accuracy.