Finding structure in data using multivariate tree boosting.

Autor: Miller PJ; Department of Psychology, University of Notre Dame., Lubke GH; Department of Psychology, University of Notre Dame., McArtor DB; Department of Psychology, University of Notre Dame., Bergeman CS; Department of Psychology, University of Notre Dame.
Jazyk: angličtina
Zdroj: Psychological methods [Psychol Methods] 2016 Dec; Vol. 21 (4), pp. 583-602.
DOI: 10.1037/met0000087
Abstrakt: Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause 2 or more outcome variables to covary. We provide the R package "mvtboost" to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package "gbm" (Ridgeway, 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. (PsycINFO Database Record
((c) 2016 APA, all rights reserved).)
Databáze: MEDLINE