Home/Agentic AI/Memory Retrieval/Ranking & Scoring

Memory Retrieval Strategies

Master how AI agents retrieve relevant memories to support intelligent decision-making and personalized responses

Multi-Factor Ranking

After retrieving candidate memories, agents must rank them to determine which ones to include in the context window. Effective ranking combines multiple factors: semantic relevance, recency, and importance.

Interactive: Adjust Ranking Weights

User query: "What ML framework should I use?"

50%
30%
20%
Ranked results (highest score first):
1

Discussed TensorFlow vs PyTorch yesterday

0.85
R: 0.88T: 0.90I: 0.70
2

Working on computer vision project

0.84
R: 0.85T: 0.80I: 0.85
3

User prefers Python for ML projects

0.71
R: 0.95T: 0.20I: 0.90
4

User is senior ML engineer

0.59
R: 0.72T: 0.10I: 1.00
Experiment tip: Try maximizing recency weight to see how recent conversations dominate. Then maximize relevance to prioritize semantic similarity. Balance is key!

⚖️ Key Scoring Factors

🎯

Semantic Relevance

Cosine similarity between query and memory embeddings. Captures semantic meaning.

Formula: cos(query_vec, memory_vec)

Temporal Recency

Exponential decay based on time since memory creation. Recent = more relevant.

Formula: e^(-λ × age_in_hours)

Memory Importance

User-defined or model-assigned importance score. Core facts rank higher.

Scale: 0.0 (trivial) to 1.0 (critical)

🧮 Combined Ranking Formula

Final Score =
(α × relevance) + (β × recency) + (γ × importance)
α (alpha)
Relevance weight, typically 0.5-0.7
β (beta)
Recency weight, typically 0.2-0.3
γ (gamma)
Importance weight, typically 0.1-0.3
Weights must sum to 1.0. Adjust based on your application needs.

🚀 Advanced Scoring Techniques

Personalization: Learn user-specific weights over time. If a user frequently references old facts, increase importance weight.
Context-Aware Scoring: Adjust weights based on query type. Factual queries prioritize importance, conversations prioritize recency.
Diversity Penalty: Reduce scores of memories too similar to already-selected ones to avoid redundancy (MMR - Maximal Marginal Relevance).
Cross-Encoder Reranking: Use heavy transformer model to rerank top candidates with query-memory cross-attention for maximum accuracy.
Prev