Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition-Reference-Cited by-同舟云学术

Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition

Published:2020-10-12 Issue:4 Volume:54 Page:975-998
ISSN:1574-020X
Container-title:Language Resources and Evaluation
language:en
Short-container-title:Lang Resources & Evaluation

Author:

Alsharhan Eiman^ORCID,Ramsay Allan

Abstract

AbstractResearch in Arabic automatic speech recognition (ASR) is constrained by datasets of limited size, and of highly variable content and quality. Arabic-language resources vary in the attributes that affect language resources in other languages (noise, channel, speaker, genre), but also vary significantly in the dialect and level of formality of the spoken Arabic they capture. Many languages suffer similar levels of cross-dialect and cross-register acoustic variability, but these effects have been under-studied. This paper is an experimental analysis of the interaction between classical ASR corpus-compensation methods (feature selection, data selection, gender-dependent acoustic models) and the dialect-dependent/register-dependent variation among Arabic ASR corpora. The first interaction studied in this paper is that between acoustic recording quality and discrete pronunciation variation. Discrete pronunciation variation can be compensated by using grapheme-based instead of phone-based acoustic models, and by filtering out speakers with insufficient training data; the latter technique also helps to compensate for poor recording quality, which is further compensated by eliminating delta-delta acoustic features. All three techniques, together, reduce Word Error Rate (WER) by between 3.24% and 5.35%. The second aspect of dialect and register variation to be considered is variation in the fine-grained acoustic pronunciations of each phoneme in the language. Experimental results prove that gender and dialect are the principal components of variation in speech, therefore, building gender and dialect-specific models leads to substantial decreases in WER. In order to further explore the degree of acoustic differences between phone models required for each of the dialects of Arabic, cross-dialect experiments are conducted to measure how far apart Arabic dialects are acoustically in order to make a better decision about the minimal number of recognition systems needed to cover all dialectal Arabic. Finally, the research addresses an important question: how much training data is needed for building efficient speaker-independent ASR systems? This includes developing some learning curves to find out how large must the training set be to achieve acceptable performance.

Funder

Kuwait University

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Linguistics and Language,Education,Language and Linguistics

Link

https://link.springer.com/content/pdf/10.1007/s10579-020-09505-5.pdf

Reference37 articles.

1. Abushariah, M., Ainon, R., Zainuddin, R., Al-Qatab, B., & Alqudah, A. (2010). Impact of a newly developed modern standard Arabic speech corpus on implementing and evaluating automatic continuous speech recognition systems. Spoken Dialogue Systems for Ambient Environments (pp. 1–12).

2. Abushariah, M. A.-A. M., Ainon, R., Zainuddin, R., Elshafei, M., & Khalifa, O. O. (2012). Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. International Arab Journal of Information Technology (IAJIT), 9(1), 84–93.

3. Alghamdi, M., Elshafei, M., & Al-Muhtaseb, H. (2007). Arabic broadcast news transcription system. International Journal of Speech Technology, 10(4), 183–195.

4. Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., & Glass, J. (2014). A complete kaldi recipe for building Arabic speech recognition systems. In 2014 IEEE Spoken Language Technology Workshop (SLT) (pp. 525–529).

5. Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., & Glass, J. (2014). A complete kaldi recipe for building Arabic speech recognition systems. In Spoken Language Technology Workshop (SLT), 2014 IEEE (pp. 525–529). IEEE: New York.

Cited by 24 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Neural multi-task learning for end-to-end Arabic aspect-based sentiment analysis;Computer Speech & Language;2025-01

2. Building Automatic Speech Recognition Systems for Moroccan Dialect: A Phoneme-Based Approach;SN Computer Science;2024-07-25

3. Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers;Journal of Electrical and Computer Engineering;2024-04-29

4. Towards inclusive automatic speech recognition;Computer Speech & Language;2024-03

5. Modern Standard Arabic speech disorders corpus for digital speech processing applications;International Journal of Speech Technology;2024-03