Outlier detection in non-elliptical data by kernel MRCD
Autor: | Schreurs, Joachim, Vranckx, Iwein, Ketelaere, Bart De, Hubert, Mia, Suykens, Johan AK, Rousseeuw, Peter J |
---|---|
Rok vydání: | 2020 |
Předmět: |
Statistics and Probability
FOS: Computer and information sciences Computer Science - Machine Learning Computer science Robust statistics Machine Learning (stat.ML) 02 engineering and technology 01 natural sciences Statistics - Computation Theoretical Computer Science Machine Learning (cs.LG) 010104 statistics & probability Statistics - Machine Learning 0202 electrical engineering electronic engineering information engineering Determinant method 0101 mathematics Computation (stat.CO) Covariance matrix Estimator Covariance Kernel method Computational Theory and Mathematics Kernel (statistics) Outlier 020201 artificial intelligence & image processing Statistics Probability and Uncertainty Algorithm |
Zdroj: | Statistics and Computing |
DOI: | 10.48550/arxiv.2008.02046 |
Popis: | The minimum regularized covariance determinant method (MRCD) is a robust estimator for multivariate location and scatter, which detects outliers by fitting a robust covariance matrix to the data. Its regularization ensures that the covariance matrix is well-conditioned in any dimension. The MRCD assumes that the non-outlying observations are roughly elliptically distributed, but many datasets are not of that form. Moreover, the computation time of MRCD increases substantially when the number of variables goes up, and nowadays datasets with many variables are common. The proposed kernel minimum regularized covariance determinant (KMRCD) estimator addresses both issues. It is not restricted to elliptical data because it implicitly computes the MRCD estimates in a kernel-induced feature space. A fast algorithm is constructed that starts from kernel-based initial estimates and exploits the kernel trick to speed up the subsequent computations. Based on the KMRCD estimates, a rule is proposed to flag outliers. The KMRCD algorithm performs well in simulations, and is illustrated on real-life data. |
Databáze: | OpenAIRE |
Externí odkaz: |