Multi-label text classification with an ensemble feature space

Autor: Kushagri Tandon, Niladri Chatterjee
Rok vydání: 2022
Předmět:
Zdroj: Journal of Intelligent & Fuzzy Systems. 42:4425-4436
ISSN: 1875-8967
1064-1246
DOI: 10.3233/jifs-219232
Popis: Multi-label text classification aims at assigning more than one class to a given text document, which makes the task more ambiguous and challenging at the same time. The ambiguities come from the fact that often several labels in the prescribed label set are semantically close to each other, making clear demarcation between them difficult. As a consequence, any Machine Learning based approach for developing multi-label classification scheme needs to define its feature space by choosing features beyond linguistic or semi-linguistic features, so that the semantic closeness between the labels is also taken into account. The present work describes a scheme of feature extraction where the training document set and the prescribed label set are intertwined in a novel way to capture the ambiguity in a meaningful way. In particular, experiments were conducted using Topic Modeling and Fuzzy C-Means clustering which aim at measuring the underlying uncertainty using probability and membership based measures, respectively. Several Nonparametric hypothesis tests establish the effectiveness of the features obtained through Fuzzy C-Means clustering in multi-label classification. A new algorithm has been proposed for training the system for multi-label classification using the above set of features.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje