Classification, Clustering, and Visualisation Based on Dual Scaling

Autor: Hans-Joachim Mucha
Rok vydání: 2013
Předmět:
Zdroj: Studies in Classification, Data Analysis, and Knowledge Organization ISBN: 9783319012636
DOI: 10.1007/978-3-319-01264-3_5
Popis: In practice, the statistician is often faced with data already available. In addition, there are often mixed data. The statistician must now try to gain optimal statistical conclusions with the most sophisticated methods. But, are the variables scaled optimally? And, what about missing data? Without loss of generality here we restrict to binary classification/clustering. A very simple but general approach is outlined that is applicable to such data for both classification and clustering, based on data preparation (i.e., a down-grading step such as binning for each quantitative variable) followed by dual scaling (the up-grading step: scoring). As a byproduct, the quantitative scores can be used for multivariate visualisation of both data and classes/clusters. For illustrative purposes, a real data application to optical character recognition (OCR) is considered throughout the paper. Moreover, the proposed approach will be compared with other multivariate methods such as the simple Bayesian classifier.
Databáze: OpenAIRE