Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers-Reference-Cited by-同舟云学术

Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers

Published:2024-04-29 Issue: Volume:2024 Page:1-11
ISSN:2090-0155
Container-title:Journal of Electrical and Computer Engineering
language:en
Short-container-title:Journal of Electrical and Computer Engineering

Author:

Sayed Sally A.¹^ORCID,Ahmed Abdel Azeem Abul Seoud Rania²,Abdel Naby Howida Y.¹

Affiliation:

1. Department of Computer Science, Faculty of Computers & Artificial Intelligence, Fayoum University, El Fayoum 63514, Egypt

2. Department of Electrical Engineering, Faculty of Engineering, Fayoum University, El Fayoum 63514, Egypt

Abstract

Automatic speech recognition (ASR) is a field of research that focuses on the ability of computers to process and interpret speech feedback from humans and to provide the highest degree of accuracy in recognition. Speech is one of the simplest ways to convey a message in a basic context, and ASR refers to the ability of machines to process and accept speech data from humans with the greatest degree of accuracy. As the human-to-machine interface continues to evolve, speech recognition is expected to become increasingly important. However, the Arabic language has distinct features that set it apart from other languages, such as the dialect and the pronunciation of words. Until now, insufficient attention has been devoted to continuous Arabic speech recognition research for independent speakers with a limited database. This research proposed two techniques for the recognition of Arabic speech. The first uses a combination of convolutional neural network (CNN) and long short-term memory (LSTM) encoders, and an attention-based decoder, and the second is based on the Sphinx-4 recognizer, which includes pocket sphinx, base sphinx, and sphinx train, with various types and number of features to be extracted (filter bank and mel frequency cepstral coefficients (MFCC)) based on the CMU Sphinx tool, which generates a language model for different sentences spoken by different speakers. These approaches were tested on a dataset containing 7 hours of spoken Arabic from 11 Arab countries, covering the Levant, Gulf, and African regions, which make up the Arab world, and achieved promising results. CNN-LSTM achieved a word error rate (WER) of 3.63% using 120 features for filter bank and 4.04% WER using 39 features for MFCC, respectively, while the Sphinx-4 recognizer technique achieved 8.17% WER and an accuracy of 91.83% using 25 features for MFCC and 8 Gaussian mixtures, respectively, when tested on the same benchmark dataset.

Publisher

Hindawi Limited

Link

http://downloads.hindawi.com/journals/jece/2024/4976944.pdf

Reference47 articles.

1. Deep Learning Methods for Arabic Autoencoder Speech Recognition System for Electro-Larynx Device

2. Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition

3. Morphology-based language modeling for arabic speech recognition

4. Investigation Arabic speech recognition using CMU sphinx system;H. Satori;The International Arab Journal of Information Technology,2009

5. Advances in dialectal Arabic speech recognition: a study using twitter to improve egyptian asr;A. Ali