1. 1-bit adam: Communication efficient large-scale training with adam's convergence speed;tang;Proceedings of the 38th International Conference on Machine Learning ICML 2021,0
2. Efficientnetv2: Smaller models and faster training;tan;Proceedings of the 38th International Conference on Machine Learning ICML 2021,0
3. ImageNet Large Scale Visual Recognition Challenge
4. Do imagenet classifiers generalize to imagenet?;recht;ArXiv Preprint,2019
5. ZeRO: Memory optimizations Toward Training Trillion Parameter Models