WEC: Weighted Ensemble of Text Classifiers

Autor:	Tien Thanh Nguyen, Ashish Upadhyay, Stewart Massie, John McCall
Rok vydání:	2020
Předmět:	Training set Computer science business.industry Deep learning Feature extraction Particle swarm optimization 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences ComputingMethodologies_PATTERNRECOGNITION Robustness (computer science) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence F1 score business computer 0105 earth and related environmental sciences
Zdroj:	CEC
DOI:	10.1109/cec48606.2020.9185641
Popis:	Text classification is one of the most important tasks in the field of Natural Language Processing. There are many approaches that focus on two main aspects: generating an effective representation; and selecting and refining algorithms to build the classification model. Traditional machine learning methods represent documents in vector space using features such as term frequencies, which have limitations in handling the order and semantics of words. Meanwhile, although achieving many successes, deep learning classifiers require substantial resources in terms of labelled data and computational complexity. In this work, a weighted ensemble of classifiers (WEC) is introduced to address the text classification problem. Instead of using majority vote as the combining method, we propose to associate each classifier’s prediction with a different weight when combining classifiers. The optimal weights are obtained by minimising a loss function on the training data with the Particle Swarm Optimisation algorithm. We conducted experiments on 5 popular datasets and report classification performance of algorithms with classification accuracy and macro F1 score. WEC was run with several different combinations of traditional machine learning and deep learning classifiers to show its flexibility and robustness. Experimental results confirm the advantage of WEC, especially on smaller datasets.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d6f7803dddbf6a425172a68dbc5d9fef https://doi.org/10.1109/cec48606.2020.9185641 Zobrazit plný text záznamu