A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning-Reference-Cited by-同舟云学术

A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning

Published:2023-12-01 Issue:23 Volume:12 Page:4859
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Ottoni Lara Toledo Cordeiro¹^ORCID,Ottoni André Luiz Carvalho²^ORCID,Cerqueira Jés de Jesus Fiais³^ORCID

Affiliation:

1. Graduate Program in Electrical Engineering, Federal University of Bahia, Salvador 40210-910, Brazil

2. Technologic and Exact Center, Federal University of Recôncavo da Bahia, Cruz das Almas 44380-000, Brazil

3. Department of Electrical and Computer Engineering, Federal University of Bahia, Salvador 40210-910, Brazil

Abstract

Speech emotion recognition (SER) is widely applicable today, benefiting areas such as entertainment, robotics, and healthcare. This emotional understanding enhances user-machine interaction, making systems more responsive and providing more natural experiences. In robotics, SER is useful in home assistance devices, eldercare, and special education, facilitating effective communication. Additionally, in healthcare settings, it can monitor patients’ emotional well-being. However, achieving high levels of accuracy is challenging and complicated by the need to select the best combination of machine learning algorithms, hyperparameters, datasets, data augmentation, and feature extraction methods. Therefore, this study aims to develop a deep learning approach for optimal SER configurations. It delves into the domains of optimizer settings, learning rates, data augmentation techniques, feature extraction methods, and neural architectures for the RAVDESS, TESS, SAVEE, and R+T+S (RAVDESS+TESS+SAVEE) datasets. After finding the best SER configurations, meta-learning is carried out, transferring the best configurations to two additional datasets, CREMA-D and R+T+S+C (RAVDESS+TESS+SAVEE+CREMA-D). The developed approach proved effective in finding the best configurations, achieving an accuracy of 97.01% for RAVDESS, 100% for TESS, 90.62% for SAVEE, and 97.37% for R+T+S. Furthermore, using meta-learning, the CREMA-D and R+T+S+C datasets achieved accuracies of 83.28% and 90.94%, respectively.

Funder

FAPESB

CAPES

UFBA

UFRB

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/23/4859/pdf

Reference40 articles.

1. Ottoni, L.T.C., and Cerqueira, J.J.F. (2021, January 11–15). A Review of Emotions in Human-Robot Interaction. Proceedings of the 2021 Latin American Robotics Symposium (LARS), Natal, Brazil.

2. Simulation of an Artificial Hearing Module for an Assistive Robot;Oliveira;Adv. Intell. Syst. Comput.,2019

3. Martins, P.S., Faria, G., and Cerqueira, J.J.F. (2020). I2E: A Cognitive Architecture Based on Emotions for Assistive Robotics Applications. Electronics, 9.

4. Baek, J.Y., and Lee, S.P. (2023). Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation. Electronics, 12.

5. Adazd-Net: Automated adaptive and explainable Alzheimer’s disease detection system using EEG signals;Khare;Knowl.-Based Syst.,2023