1. Ranking via sinkhorn propagation;Adams Ryan Prescott;arXiv preprint arXiv:1106.1925,2011
2. Weighted transformer network for machine translation;Ahmed Karim;arXiv preprint arXiv:1711.02132,2017
3. ETC: Encoding long and structured data in transformers;Ainslie Joshua;Proceedings of EMNLP,2020
4. Layer normalization;Ba Jimmy Lei;arXiv preprint arXiv:1607.06450,2016
5. Longformer: The long-document transformer;Beltagy Iz;arXiv preprint arXiv:2004.05150,2020