Autor: |
Jeong, Sanghun, Kim, Choongrak, Yang, Hojin |
Předmět: |
|
Zdroj: |
Journal of Nonparametric Statistics; Sep2024, Vol. 36 Issue 3, p623-642, 20p |
Abstrakt: |
The aim of this paper is to develop a marginal screening method for variable screening in high-dimensional binary classification based on the Wasserstein distance accounting for the distributional difference. Many existing screening methods, such as the two-sample t-test and Kolmogorov test, have been developed under the parametric/nonparametric modeling assumptions to reduce the dimension of the predictors. However, such modeling specifications or nonparametric approaches are associated with the probability measure induced by the predictor in a Euclidean space. While many machine learning methods have successfully found the nonlinear decision boundary in the transformed space, called the reproducing kernel Hilbert space (RKHS), we consider the Wasserstein filter's capacity to detect the distributional difference between two probability measures induced by the nonlinear function of the predictor in the RKHS. Thereby, we can flexibly filter out the non-informative predictors associated with the binary classification, as well as escape the modeling assumptions required in a Euclidean space. We prove that the Wasserstein filter satisfies the sure screening property under some mild conditions. We also demonstrate the advantages of our proposed approach by comparing the finite sample performance of it with those of the existing choices through simulation studies, as well as through application to lung cancer data. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|