1. A systematic survey of chemical pre-trained models;Xia,2023
2. BERT: Pre-training of deep bidirectional transformers for language understanding;Devlin,2018
3. Improving language understanding by generative pre-training;Radford,2018
4. T.B. Brown, B. Mann, et al., Language Models are Few-Shot Learners, in: Proc. of NeurIPS, 2020.
5. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2020