Exploring data augmentation techniques for advancing emotion recognition in text - An ANOVA-RFE fusion approach with emotion fusion ensemble-Reference-Cited by-同舟云学术

Exploring data augmentation techniques for advancing emotion recognition in text - An ANOVA-RFE fusion approach with emotion fusion ensemble

Published:2025-03-25 Issue: Volume: Page:
ISSN:1088-467X
Container-title:Intelligent Data Analysis: An International Journal
language:en
Short-container-title:Intelligent Data Analysis: An International Journal

Author:

Varghese Babu Nirmal¹^ORCID,Grace Mary Kanaga E.²

Affiliation:

1. Division of Computer Science and Engineering, School of Computer Science and Technology, Karunya Institute of Technology and Sciences, Coimbatore, India

2. Division of Data Science and Cyber Security, School of Computer Science and Technology, Karunya Institute of Technology and Sciences, Coimbatore, India

Abstract

Emotion recognition in text is a complex challenge essential for enhancing human-computer interaction. This study introduces an advanced system utilizing the EmoBank dataset, which contains text annotated with six basic emotions from diverse sources like social media, narratives, news articles, and movie scripts, ensuring accurate labeling through automated algorithms and manual annotation. Extensive preprocessing techniques, including tokenization, lowercasing, stop words removal, stemming, lemmatization, punctuation removal, handling special characters and numbers, text normalization, and encoding, prepare the text for analysis. Data augmentation methods such as random insertion, deletion, swapping, synonym replacement, and leveraging Large Language Models (LLMs) like GPT-3, BERT, and RoBERTa enrich the dataset. Feature extraction combines word embeddings with self-attention mechanisms to capture contextual and semantic information. The ANOVA-RFE Fusion technique is applied for feature selection, while the Emotion Fusion Ensemble (EFE) method enhances classification by combining Random Forest, Gradient Boosting, AdaBoost, XGBoost, Extra Trees, SVM, and K-NN. Systematic experimentation and hyperparameter tuning using grid search validate the system's performance. Notably, the combination of GPT-3+WE+ANOVA-RFE+EFE achieved 89% accuracy before tuning and 94% after tuning. This research underscores the critical role of integrated processing, augmentation, and ensemble learning in advancing emotion recognition, suggesting future exploration of emerging language models, novel augmentation techniques, and domain specific adaptations for developing more accurate and robust systems.

Publisher

SAGE Publications

Link

https://journals.sagepub.com/doi/pdf/10.1177/1088467X251325354

Reference56 articles.

1. Robust scientific text classification using prompt tuning based on data augmentation with L2 regularization;Shi S;Inf Process Manag,2024

2. TABAS: text augmentation based on attention score for text classification model;Yu YJ;ICT Express,2022

3. A novel textual data augmentation method for identifying comparative text from user-generated content

4. Text augmentation using a graph-based approach and clonal selection algorithm;Ahmed H;Mach Learn Appl,2023

5. Contrastive graph convolutional networks with adaptive augmentation for text classification;Yang Y;Inf Process Manag,2022