Classification, Clustering, and Visualisation Based on Dual Scaling

Autor:	Hans-Joachim Mucha
Rok vydání:	2013
Předmět:	Naive Bayes classifier Multivariate statistics Binary classification Computer science Correlation clustering Without loss of generality Optical character recognition Data mining Cluster analysis computer.software_genre Missing data computer
Zdroj:	Studies in Classification, Data Analysis, and Knowledge Organization ISBN: 9783319012636
DOI:	10.1007/978-3-319-01264-3_5
Popis:	In practice, the statistician is often faced with data already available. In addition, there are often mixed data. The statistician must now try to gain optimal statistical conclusions with the most sophisticated methods. But, are the variables scaled optimally? And, what about missing data? Without loss of generality here we restrict to binary classification/clustering. A very simple but general approach is outlined that is applicable to such data for both classification and clustering, based on data preparation (i.e., a down-grading step such as binning for each quantitative variable) followed by dual scaling (the up-grading step: scoring). As a byproduct, the quantitative scores can be used for multivariate visualisation of both data and classes/clusters. For illustrative purposes, a real data application to optical character recognition (OCR) is considered throughout the paper. Moreover, the proposed approach will be compared with other multivariate methods such as the simple Bayesian classifier.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::e89f64e6e413447599e56b86d948115d https://doi.org/10.1007/978-3-319-01264-3_5 Zobrazit plný text záznamu