The minimum regularized covariance determinant estimator
Autor: | Kris Boudt, Peter J. Rousseeuw, Steven Vanduffel, Tim Verdonck |
---|---|
Přispěvatelé: | Econometrics and Data Science, Business, Vrije Universiteit Brussel |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Statistics and Probability
FOS: Computer and information sciences Technology Statistics & Probability ROBUST 010103 numerical & computational mathematics 01 natural sciences Regularization (mathematics) Theoretical Computer Science Methodology (stat.ME) 010104 statistics & probability Matrix (mathematics) Dimension (vector space) Robustness (computer science) Scatter matrix Computer Science Theory & Methods Regularization Convex combination ALGORITHM 0101 mathematics Statistics - Methodology Computer. Automation Science & Technology Estimator Covariance MULTIVARIATE LOCATION High-dimensional data SCATTER Computational Theory and Mathematics Breakdown value Physical Sciences Computer Science OUTLIER DETECTION Robust covariance estimation Statistics Probability and Uncertainty SDG 12 - Responsible Consumption and Production Algorithm MATRIX Mathematics |
Zdroj: | Statistics and Computing, 30(1), 113-128. Springer Netherlands Statistics and computing Boudt, K, Rousseeuw, P J, Vanduffel, S & Verdonck, T 2020, ' The minimum regularized covariance determinant estimator ', Statistics and Computing, vol. 30, no. 1, pp. 113-128 . https://doi.org/10.1007/s11222-019-09869-x |
ISSN: | 0960-3174 |
DOI: | 10.1007/s11222-019-09869-x |
Popis: | © 2019, Springer Science+Business Media, LLC, part of Springer Nature. The minimum covariance determinant (MCD) approach estimates the location and scatter matrix using the subset of given size with lowest sample covariance determinant. Its main drawback is that it cannot be applied when the dimension exceeds the subset size. We propose the minimum regularized covariance determinant (MRCD) approach, which differs from the MCD in that the scatter matrix is a convex combination of a target matrix and the sample covariance matrix of the subset. A data-driven procedure sets the weight of the target matrix, so that the regularization is only used when needed. The MRCD estimator is defined in any dimension, is well-conditioned by construction and preserves the good robustness properties of the MCD. We prove that so-called concentration steps can be performed to reduce the MRCD objective function, and we exploit this fact to construct a fast algorithm. We verify the accuracy and robustness of the MRCD estimator in a simulation study and illustrate its practical use for outlier detection and regression analysis on real-life high-dimensional data sets in chemistry and criminology. ispartof: STATISTICS AND COMPUTING vol:30 issue:1 pages:113-128 status: published |
Databáze: | OpenAIRE |
Externí odkaz: |