Home/AI/Knowledge Distillation/Introduction

🎓 Knowledge Distillation

Transfer knowledge from large models to compact ones while preserving performance

Your Progress

0 / 5 completed

←

Previous Module

Quantization Techniques

Introduction to Knowledge Distillation

🎯 What is Knowledge Distillation?

Knowledge distillation transfers the knowledge of a large, complex teacher model to a smaller, efficient student model. The student learns not just from hard labels, but from the teacher's soft probability distributions.

💡

Key Concept

Soft targets contain more information than hard labels - they reveal class similarities and uncertainties.

👥 Teacher-Student Framework

👨‍🏫

Teacher Model

• Large, complex architecture
• High accuracy (95%+)
• Slow inference
• Pre-trained and frozen

🎓

Student Model

• Small, efficient architecture
• Near-teacher accuracy (93-94%)
• Fast inference (10x+)
• Trained with distillation

📊 Hard vs Soft Targets

Hard Labels (Traditional)

Cat:

1.0

Dog:

0.0

❌ No information about class relationships

Soft Targets (Distillation)

Cat:

0.85

Dog:

0.10

Tiger:

0.03

✓ Reveals class similarities (cats vs dogs vs tigers)

✨ Benefits

📦

Model Compression

10x smaller models with minimal accuracy loss

⚡

Faster Inference

10-100x speedup for edge deployment

🎯

Better Generalization

Students often generalize better than baseline

💰

Cost Reduction

Lower infrastructure and serving costs