1. Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Networks. 1989;2(5):359-366. https://www.sciencedirect.com/science/article/pii/0893608089900208 (Accessed June 2023).
2. DEEP MIND Mathematics, Machine Learning & Computer Science: The Universal Approximation Theorem. https://www.deep-mind.org/2023/03/26/the-universal-approximation-theorem/ (Accessed June 2023).
3. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018; arXiv:1810.04805 https://doi.org/10.48550/arXiv.1810.04805 (Accessed June 2023).
4. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS 2017). https://papers.nips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
5. Radford A, Narasimhan K, Saliman T, Sutskever I. Improving language understanding by generative pre-training. 2018. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (Accessed June 2023).