Knowledge distillation is a fundamental AI technique that allows researchers to create smaller, more efficient models by training a 'student' model using knowledge from a larger 'teacher' model. Originally developed by Geoffrey Hinton and colleagues at Google in 2015, the method works by having the teacher model share probability distributions rather than just final answers, revealing 'dark knowledge' about relationships between categories. This technique has become widely adopted by major tech companies to reduce computational costs while maintaining model performance, with examples like DistilBERT demonstrating significant size reductions with minimal accuracy loss.
Sort: