Classification of Underrepresented Text Data in an Imbalanced Dataset Using Deep Neural Network

Autor: Tajbia Hossain, Raqeebir Rab, Humaira Zahin Mauni
Rok vydání: 2020
Předmět:
Zdroj: 2020 IEEE Region 10 Symposium (TENSYMP).
DOI: 10.1109/tensymp50017.2020.9231021
Popis: Text classification is a well researched, much-explored topic for burgeoning researchers in the field of data mining. Yet, all the inadequacies that actual text data present during practical applications of these models in real life calls forth the need for further study. Imbalanced datasets, in particular, provide a roadblock to common high-performance algorithms. Thus, our research explores text classification concerning imbalanced datasets containing underrepresented categories, using the favored machine learning algorithms of today - neural networks. We have looked to explore how neural network classification models can improve upon inadequate datasets. Our research showcases why deep learning is the preferred classification algorithm, and proves, with hard evidence, that it outperforms other machine learning algorithms in text classification.
Databáze: OpenAIRE