Real-time discriminant analysis in the presence of label and measurement noise
Autor: | Mia Hubert, Peter J. Rousseeuw, Bart De Ketelaere, Jakob Raymaekers, Iwein Vranckx |
---|---|
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Computer science 01 natural sciences Plot (graphics) Analytical Chemistry Methodology (stat.ME) 03 medical and health sciences Statistics - Methodology Spectroscopy 030304 developmental biology 0303 health sciences Training set business.industry Covariance matrix Process Chemistry and Technology 010401 analytical chemistry Estimator Pattern recognition Quadratic classifier Linear discriminant analysis 0104 chemical sciences Computer Science Applications ComputingMethodologies_PATTERNRECOGNITION Outlier Anomaly detection Artificial intelligence Noise (video) business Software |
Zdroj: | Chemometrics and Intelligent Laboratory Systems, 208. Elsevier Science |
ISSN: | 0169-7439 |
DOI: | 10.1016/j.chemolab.2020.104197 |
Popis: | Quadratic discriminant analysis (QDA) is a widely used classification technique. Based on a training dataset, each class in the data is characterized by an estimate of its center and shape, which can then be used to assign unseen observations to one of the classes. The traditional QDA rule relies on the empirical mean and covariance matrix. Unfortunately, these estimators are sensitive to label and measurement noise which often impairs the model’s predictive ability. Robust estimators of location and scatter are resistant to this type of contamination. However, they have a prohibitive computational cost for large scale industrial experiments. We present a novel QDA method based on a recent real-time robust algorithm. We additionally integrate an anomaly detection step to classify the most atypical observations into a separate class of outliers. Finally, we introduce the label bias plot, a graphical display to identify label and measurement noise in the training data. The performance of the proposed approach is illustrated in a simulation study with huge datasets, and on real datasets about diabetes and fruit. |
Databáze: | OpenAIRE |
Externí odkaz: |