EMSMOTE: Ensemble multiclass synthetic minority oversampling technique to improve accuracy of multilingual sentiment analysis on imbalance data

Published

20-12-2024

DOI:

https://doi.org/10.58414/SCIENTIFICTEMPER.2024.15.4.17

Keywords:

Sentiment analysis, Natural language processing, Multilingual dataset, Imbalance classification, SMOTE.

Dimensions Badge

Issue

Section

SECTION C: ARTIFICIAL INTELLIGENCE, ENGINEERING, TECHNOLOGY

Authors

  • Ayesha Shakith Department of Computer Science, St. Joseph’s College (Autonomous), Affiliated to Bharathidasan University, Trichy, India.
  • L. Arockiam Department of Computer Science, St. Joseph’s College (Autonomous), Affiliated to Bharathidasan University, Trichy, India.

Abstract

Natural language processing (NLP) tasks, such as multilingual sentiment analysis, are inherently challenging, especially when dealing with unbalanced data. A dataset is considered imbalanced when one class significantly dominates the others, creating an unbalanced distribution. In many domains, the minority class holds crucial information, presenting unique challenges. This research addresses these challenges using an ensemble-based oversampling technique, EMSMOTE (Ensemble Multiclass Synthetic Minority Oversampling Technique). By leveraging SMOTE, EMSMOTE generates multiple synthetic datasets to train various classifiers. The proposed model, when combined with an ensemble random forest classifier, attained an impressive accuracy of 90.73%. This ensemble approach not only mitigates the effects of noisy synthetic samples introduced by SMOTE but also showcases significant enhancement in the overall performance in tackling class imbalances.

How to Cite

Ayesha Shakith, & L. Arockiam. (2024). EMSMOTE: Ensemble multiclass synthetic minority oversampling technique to improve accuracy of multilingual sentiment analysis on imbalance data. The Scientific Temper, 15(04), 3099–3104. https://doi.org/10.58414/SCIENTIFICTEMPER.2024.15.4.17

Downloads

Download data is not yet available.