Home/AI/Actor-Critic Architectures/Introduction

🎭 Actor-Critic Architectures

Combining policy and value learning for powerful reinforcement learning

Your Progress

0 / 5 completed

←

Previous Module

Policy Gradient Methods

Introduction to Actor-Critic

🎯 What is Actor-Critic?

Actor-Critic combines the best of policy gradient and value-based methods. The actor learns the policy (what to do), while the critic evaluates actions by estimating value functions. This synergy reduces variance and accelerates learning.

💡

Key Insight

The critic provides a baseline that reduces the variance of policy gradient updates, making learning more stable and sample-efficient than pure policy gradients.

🎭

Actor Network

Learns the policy π(a|s) mapping states to action probabilities

• Outputs action distribution
• Updated via policy gradient
• Guided by critic's feedback

📊

Critic Network

Estimates value function V(s) or Q(s,a) to judge action quality

• Evaluates state/action pairs
• Updated via TD learning
• Provides advantage estimates

🔄 The Actor-Critic Loop

Actor Selects Action

Sample action a from policy π(a|s) given current state s

Environment Responds

Execute action, observe reward r and next state s'

Critic Evaluates

Compute TD error: δ = r + γV(s') - V(s)

Update Both Networks

Critic learns V(s), Actor improves policy using advantage A(s,a)

✅ Advantages

• Lower variance than REINFORCE
• More sample-efficient learning
• Online and incremental updates
• Works with continuous actions

⚠️ Challenges

• Two networks to train simultaneously
• Can suffer from bias
• Hyperparameter sensitivity
• Stability issues possible