1. Attention is all you need;Vaswani;Neural Inf. Process. Syst.,2017
2. J. Devlin, M.W. Chang, K. Lee, and K. Toutanova, "BERT: pretraining of deep bidirectional transformers for language understanding,” 2018, arXiv:1810.04805.
3. A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training,” OpenAI blog, 2018.
4. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners,” OpenAI blog, 2019.
5. Language models are few-shot learners;Brown;Neural Inf. Process. Syst.,2020