Adjusted Pearson Chi-Square feature screening for multi-classification with ultrahigh dimensional data
Autor: | Lyu Ni, Fangjiao Wan, Fang Fang |
---|---|
Rok vydání: | 2017 |
Předmět: |
Statistics and Probability
05 social sciences Pearson's chi-squared test Modified method computer.software_genre 01 natural sciences 010104 statistics & probability symbols.namesake 0502 economics and business Statistics Covariate symbols p-value Data mining Feature screening 0101 mathematics Statistics Probability and Uncertainty computer Categorical variable Selection (genetic algorithm) 050205 econometrics Mathematics |
Zdroj: | Metrika. 80:805-828 |
ISSN: | 1435-926X 0026-1335 |
Popis: | Huang et al. (J Bus Econ Stat 32:237–244, 2014) first proposed a Pearson Chi-Square based feature screening procedure tailored to multi-classification problem with ultrahigh dimensional categorical covariates, which is a common problem in practice but has seldom been discussed in the literature. However, their work establishes the sure screening property only in a limited setting. Moreover, the p value based adjustments when the number of categories involved by each covariate is different do not work well in several practical situations. In this paper, we propose an adjusted Pearson Chi-Square feature screening procedure and a modified method for tuning parameter selection. Theoretically, we establish the sure screening property of the proposed method in general settings. Empirically, the proposed method is more successful than Pearson Chi-Square feature screening in handling non-equal numbers of covariate categories in finite samples. Results of three simulation studies and one real data analysis are presented. Our work together with Huang et al. (J Bus Econ Stat 32:237–244, 2014) establishes a solid theoretical foundation and empirical evidence for the family of Pearson Chi-Square based feature screening methods. |
Databáze: | OpenAIRE |
Externí odkaz: |