1. Buciluă, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD International conference on knowledge discovery and data mining (pp. 535–541).
2. A simple framework for contrastive learning of visual representations;Chen,2020
3. Chen, D., Mei, J.-P., Zhang, H., Wang, C., Feng, Y., & Chen, C. (2022). Knowledge distillation with the reused teacher classifier. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11933–11942).
4. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2020). An Image is Worth 16‘× 16 Words: Transformers for Image Recognition at Scale. In International conference on learning representations.
5. Seed: Self-supervised distillation for visual representation;Fang,2021