Affiliation:
1. Division of Computer Science and Engineering, School of Computer Science and Technology, Karunya Institute of Technology and Sciences, Coimbatore, India
2. Division of Data Science and Cyber Security, School of Computer Science and Technology, Karunya Institute of Technology and Sciences, Coimbatore, India
Abstract
Emotion recognition in text is a complex challenge essential for enhancing human-computer interaction. This study introduces an advanced system utilizing the EmoBank dataset, which contains text annotated with six basic emotions from diverse sources like social media, narratives, news articles, and movie scripts, ensuring accurate labeling through automated algorithms and manual annotation. Extensive preprocessing techniques, including tokenization, lowercasing, stop words removal, stemming, lemmatization, punctuation removal, handling special characters and numbers, text normalization, and encoding, prepare the text for analysis. Data augmentation methods such as random insertion, deletion, swapping, synonym replacement, and leveraging Large Language Models (LLMs) like GPT-3, BERT, and RoBERTa enrich the dataset. Feature extraction combines word embeddings with self-attention mechanisms to capture contextual and semantic information. The ANOVA-RFE Fusion technique is applied for feature selection, while the Emotion Fusion Ensemble (EFE) method enhances classification by combining Random Forest, Gradient Boosting, AdaBoost, XGBoost, Extra Trees, SVM, and K-NN. Systematic experimentation and hyperparameter tuning using grid search validate the system's performance. Notably, the combination of GPT-3+WE+ANOVA-RFE+EFE achieved 89% accuracy before tuning and 94% after tuning. This research underscores the critical role of integrated processing, augmentation, and ensemble learning in advancing emotion recognition, suggesting future exploration of emerging language models, novel augmentation techniques, and domain specific adaptations for developing more accurate and robust systems.