A Primer in BERTology: What We Know About How BERT Works-Reference-Cited by-同舟云学术

A Primer in BERTology: What We Know About How BERT Works

Published:2020-12 Issue: Volume:8 Page:842-866
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:Transactions of the Association for Computational Linguistics

Author:

Rogers Anna¹,Kovaleva Olga²,Rumshisky Anna²

Affiliation:

1. Center for Social Data Science, University of Copenhagen.

2. Dept. of Computer Science, University of Massachusetts Lowell.

Abstract

Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00349

Reference180 articles.

Cited by 363 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Supervised and unsupervised learning models for pharmaceutical drug rating and classification using consumer generated reviews;Healthcare Analytics;2024-06

2. Mining and fusing unstructured online reviews and structured public index data for hospital selection;Information Fusion;2024-03

3. DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning;Computers in Biology and Medicine;2024-03

4. The Life Cycle of Knowledge in Big Language Models: A Survey;Machine Intelligence Research;2024-01-12

5. Explainability for Large Language Models: A Survey;ACM Transactions on Intelligent Systems and Technology;2024-01-02