Consistent Spectral Methods for Dimensionality Reduction

Autor: Tabea Rebafka, Nataliya Sokolovska, Malika Kharouf
Přispěvatelé: Laboratoire Modélisation et Sûreté des Systèmes (LM2S), Institut Charles Delaunay (ICD), Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS)-Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Probabilités, Statistique et Modélisation (LPSM (UMR_8001)), Université Paris Diderot - Paris 7 (UPD7)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Institut National de la Santé et de la Recherche Médicale (INSERM), ANR-17-CE23-0006,DiagnoLearn,Apprentissage des modèles interprétables pour le diagnostic médical(2017), Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Probabilités, Statistiques et Modélisations (LPSM (UMR_8001))
Jazyk: angličtina
Rok vydání: 2018
Předmět:
Zdroj: 2018 26th European Signal Processing Conference (EUSIPCO)
2018 26th European Signal Processing Conference (EUSIPCO), Sep 2018, Rome, Italy. pp.286-290, ⟨10.23919/EUSIPCO.2018.8553295⟩
EUSIPCO
DOI: 10.23919/EUSIPCO.2018.8553295⟩
Popis: International audience; This paper addresses the problem of dimension reduction of noisy data, more precisely the challenge to determine the dimension of the subspace where the observed signal lives in. Based on results from random matrix theory, two novel estimators of the signal dimension are proposed in this paper. Consistency of the estimators is proved in the modern asymptotic regime, where the number of parameters grows proportionally with the sample size. Experimental results show that the novel estimators are robust to noise and, moreover, they give highly accurate results in settings where standard methods fail. We apply the novel dimension estimators to several life sciences benchmarks in the context of classification, and illustrate the improvements achieved by the new methods compared to the state-of-the-art approaches.
Databáze: OpenAIRE