Meta-Learning for Agents

Implement meta-learning for agents that adapt to new tasks quickly

Model-Agnostic Meta-Learning (MAML)

MAML is the most popular meta-learning algorithm. Core idea: Find initial model parameters that enable fast adaptation to new tasks with just a few gradient steps. Instead of training for task performance, train for adaptability. Result: Model that learns in 5-10 examples what normally takes 1000+.

Interactive: MAML Training Process

Watch how MAML trains through inner and outer loops:

🎲
Step 1: Sample Tasks
Randomly select batch of tasks from task distribution
🔄
Step 2: Inner Loop
For each task: adapt model with few gradient steps
📊
Step 3: Evaluate
Test adapted model on held-out examples from each task
🎯
Step 4: Outer Loop
Update initial parameters to improve adaptation speed
🔁
Step 5: Repeat
Continue until model learns good initialization

MAML Algorithm Details

# MAML Pseudocode
Initialize θ (model parameters)

while not converged:
    # Sample batch of tasks
    tasks = sample_tasks(task_distribution, batch_size=32)
    
    for task in tasks:
        # Inner loop: Task-specific adaptation
        support_data = task.get_support_set(k_shot=5)
        θ_adapted = θ - α * ∇_θ L_task(θ, support_data)
        
        # Evaluate adapted model
        query_data = task.get_query_set()
        loss = L_task(θ_adapted, query_data)
        meta_losses.append(loss)
    
    # Outer loop: Update initialization
    θ = θ - β * ∇_θ Σ meta_losses
    
# Result: θ is now a good initialization for fast adaptation

Key Hyperparameters

Inner LR (α): 0.01-0.1. Controls task adaptation speed. Higher = faster but less stable.
Outer LR (β): 0.001-0.01. Controls meta-learning speed. Lower = more stable meta-training.
K-Shot: 5-50. Number of examples per task during meta-training. More = better but slower.
Task Batch: 16-64 tasks. More tasks = better gradient estimate but higher memory cost.
💡
Why MAML Works

MAML finds parameters that are sensitive to task-specific updates. Small gradient steps from these parameters lead to large improvements on new tasks. Think of it as finding the center of a valley where any direction (task) leads downhill (better performance). Requires diverse task distribution during meta-training—50-100 different tasks minimum.

Introduction