Multivariate Location and Scatter Matrix Estimation Under Cellwise and Casewise Contamination
Autor: | Victor J. Yohai, Andy Leung, Ruben H. Zamar |
---|---|
Rok vydání: | 2016 |
Předmět: |
Statistics and Probability
Multivariate statistics Weight function Computer science Matemáticas Mathematics - Statistics Theory Bivariate analysis Statistics Theory (math.ST) 01 natural sciences Matemática Pura 010104 statistics & probability 03 medical and health sciences 0302 clinical medicine Dimension (vector space) Scatter matrix FOS: Mathematics 030212 general & internal medicine CELLWISE OUTLIERS MULTIVARIATE LOCATION AND SCATTER 0101 mathematics ROBUST ESTIMATION Applied Mathematics Univariate 62G35 62G05 62G20 Filter (signal processing) Computational Mathematics Computational Theory and Mathematics Outlier COMPONENTWISE CONTAMINATION Algorithm CIENCIAS NATURALES Y EXACTAS |
DOI: | 10.48550/arxiv.1609.00402 |
Popis: | Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner stone in multivariate data analysis. A two-step approach was recently proposed to perform robust estimation of multivariate location and scatter matrix in the presence of cellwise and casewise outliers. In the first step a univariate filter was applied to remove cellwise outliers. In the second step a generalized S-estimator was used to downweight casewise outliers. This proposal can be further improved in three main directions. First, through the introduction of a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, through the proposal of a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, through the use of a non-monotonic weight function for the generalized S-estimator to better handle casewise outliers in high dimension. A simulation study and a real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well in high dimension. Moreover, they show that the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination. Fil: Leung, Andy. University of British Columbia; Canadá Fil: Yohai, Victor Jaime. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Matemática; Argentina Fil: Zamar, Ruben Horacio. University of British Columbia; Canadá |
Databáze: | OpenAIRE |
Externí odkaz: |