Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning

Author:

Elhassan Nasrin1,Varone Giuseppe2ORCID,Ahmed Rami1,Gogate Mandar3,Dashtipour Kia3ORCID,Almoamari Hani4,El-Affendi Mohammed A.5ORCID,Al-Tamimi Bassam Naji6,Albalwy Faisal78ORCID,Hussain Amir3ORCID

Affiliation:

1. College of Computer Sciences and Information Technology, Sudan University of Science and Technology, Khartoum P.O. Box 407, Sudan

2. Department of Physical Therapy, Movement and Rehabilitation Science, Northeastern University, Boston, MA 02115, USA

3. School of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UK

4. Faculty of Computer and Information Systems, Islamic University of Madinah, Medina 42351, Saudi Arabia

5. Department of Computer Science, College of Computer and Information Sciences, Prince Sultan University, Riyadh 12435, Saudi Arabia

6. School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK

7. Department of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah 42353, Saudi Arabia

8. Division of Informatics, Imaging and Data Sciences, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Stopford Building, Oxford Road, Manchester M13 9PL, UK

Abstract

Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers’ state-of-the-art outcomes applying the same dataset.

Funder

UK Engineering and Physical Sciences Research Council

Publisher

MDPI AG

Subject

Computer Networks and Communications,Human-Computer Interaction

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3