Experimentally Studying Progressive Filtering in Presence of Input Imbalance

Autor:	Andrea Addis, Giuliano Armano, Eloisa Vargiu
Rok vydání:	2013
Předmět:	ComputingMethodologies_PATTERNRECOGNITION Text categorization Binary classification Categorization business.industry Computer science Pattern recognition Artificial intelligence business Classifier (UML) Selection algorithm
Zdroj:	Communications in Computer and Information Science ISBN: 9783642297632 IC3K
DOI:	10.1007/978-3-642-29764-9_4
Popis:	Progressively Filtering (PF) is a simple categorization technique framed within the local classifier per node approach. In PF, each classifier is entrusted with deciding whether the input in hand can be forwarded or not to its children. A simple way to implement PF consists of unfolding the given taxonomy into pipelines of classifiers. In so doing, each node of the pipeline is a binary classifier able to recognize whether or not an input belongs to the corresponding class. In this chapter, we illustrate and discuss the results obtained by assessing the PF technique, used to perform text categorization. Experiments, on the Reuters Corpus (RCV1- v2) dataset, are focused on the ability of PF to deal with input imbalance. In particular, the baseline is: (i) comparing the results to those calculated resorting to the corresponding flat approach; (ii) calculating the improvement of performance while augmenting the pipeline depth; and (iii) measuring the performance in terms of generalization- / specialization- / misclassification-error and unknown-ratio. Experimental results show that, for the adopted dataset, PF is able to counteract great imbalances between negative and positive examples. We also present and discuss further experiments aimed at assessing TSA, the greedy threshold selection algorithm adopted to perform PF, against a relaxed brute-force algorithm and the most relevant state-of-the-art algorithms.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::726583b151f7924cb51576e3a8acf0de https://doi.org/10.1007/978-3-642-29764-9_4 Zobrazit plný text záznamu