A study on using data clustering for feature extraction to improve the quality of classification
Autor: | Tadeusz Morzy, Maciej Piernik |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer science
media_common.quotation_subject Feature extraction 02 engineering and technology Machine learning computer.software_genre Similarity (network science) Artificial Intelligence 0202 electrical engineering electronic engineering information engineering medicine Quality (business) Statistical analysis Cluster analysis media_common Confusion business.industry Perspective (graphical) 020206 networking & telecommunications Subject (documents) Human-Computer Interaction ComputingMethodologies_PATTERNRECOGNITION Hardware and Architecture 020201 artificial intelligence & image processing Artificial intelligence medicine.symptom business computer Software Information Systems |
Zdroj: | Knowledge and Information Systems. 63:1771-1805 |
ISSN: | 0219-3116 0219-1377 |
DOI: | 10.1007/s10115-021-01572-6 |
Popis: | There is a certain belief among data science researchers and enthusiasts alike that clustering can be used to improve classification quality. Insofar as this belief is fairly uncontroversial, it is also very general and therefore produces a lot of confusion around the subject. There are many ways of using clustering in classification and it obviously cannot always improve the quality of predictions, so a question arises, in which scenarios exactly does it help? Since we were unable to find a rigorous study addressing this question, in this paper, we try to shed some light on the concept of using clustering for classification. To do so, we first put forward a framework for incorporating clustering as a method of feature extraction for classification. The framework is generic w.r.t. similarity measures, clustering algorithms, classifiers, and datasets and serves as a platform to answer ten essential questions regarding the studied subject. Each answer is formulated based on a separate experiment on 16 publicly available datasets, followed by an appropriate statistical analysis. After performing the experiments and analyzing the results separately, we discuss them from a global perspective and form general conclusions regarding using clustering as feature extraction for classification. |
Databáze: | OpenAIRE |
Externí odkaz: |