🎓 Knowledge Distillation

Transfer knowledge from large models to compact ones while preserving performance

Your Progress

0 / 5 completed
Previous Module
Quantization Techniques

Introduction to Knowledge Distillation

🎯 What is Knowledge Distillation?

Knowledge distillation transfers the knowledge of a large, complex teacher model to a smaller, efficient student model. The student learns not just from hard labels, but from the teacher's soft probability distributions.

💡
Key Concept

Soft targets contain more information than hard labels - they reveal class similarities and uncertainties.

👥 Teacher-Student Framework

👨‍🏫

Teacher Model

  • • Large, complex architecture
  • • High accuracy (95%+)
  • • Slow inference
  • • Pre-trained and frozen
🎓

Student Model

  • • Small, efficient architecture
  • • Near-teacher accuracy (93-94%)
  • • Fast inference (10x+)
  • • Trained with distillation

📊 Hard vs Soft Targets

Hard Labels (Traditional)

Cat:
1.0
Dog:
0.0

❌ No information about class relationships

Soft Targets (Distillation)

Cat:
0.85
Dog:
0.10
Tiger:
0.03

✓ Reveals class similarities (cats vs dogs vs tigers)

✨ Benefits

📦

Model Compression

10x smaller models with minimal accuracy loss

Faster Inference

10-100x speedup for edge deployment

🎯

Better Generalization

Students often generalize better than baseline

💰

Cost Reduction

Lower infrastructure and serving costs