Ranking Forests
Autor: | Clémençon, Stéphan, Depecker, Marine, Vayatis, Nicolas |
---|---|
Přispěvatelé: | Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Images, Données, Signal (IDS), Télécom ParisTech, Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Centre de Mathématiques et de Leurs Applications (CMLA), École normale supérieure - Cachan (ENS Cachan)-Centre National de la Recherche Scientifique (CNRS), Clémençon, Stephan |
Jazyk: | angličtina |
Rok vydání: | 2013 |
Předmět: |
[MATH.MATH-PR] Mathematics [math]/Probability [math.PR]
nonparametric scoring [MATH] Mathematics [math] bagging rank aggregation AUC criterion [STAT.ML] Statistics [stat]/Machine Learning [stat.ML] tree-based ranking rules [MATH.MATH-PR]Mathematics [math]/Probability [math.PR] classification data [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ROC optimization median ranking [MATH]Mathematics [math] bootstrap [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] bipartite ranking feature randomization |
Zdroj: | Journal of Machine Learning Research Journal of Machine Learning Research, Microtome Publishing, 2013, 14, pp.39-73 |
ISSN: | 1532-4435 1533-7928 |
Popis: | International audience; The present paper examines how the aggregation and feature randomization principles underlying the algorithm RANDOM FOREST (Breiman, 2001) can be adapted to bipartite ranking. The approach taken here is based on nonparametric scoring and ROC curve optimization in the sense of the AUC criterion. In this problem, aggregation is used to increase the performance of scoring rules produced by ranking trees, as those developed in Clémençon and Vayatis (2009c). The present work describes the principles for building median scoring rules based on concepts from rank aggregation. Consistency results are derived for these aggregated scoring rules and an algorithm called RANKING FOREST is presented. Furthermore, various strategies for feature randomization are explored through a series of numerical experiments on artificial data sets. |
Databáze: | OpenAIRE |
Externí odkaz: |