Machine Learning or Econometrics for Credit Scoring: Let's Get the Best of Both Worlds
Autor: | Sullivan Hué, Sessi Tokpavi, Christophe Hurlin, Elena Ivona Dumitrescu |
---|---|
Přispěvatelé: | Hué, Sullivan, EconomiX, Université Paris Nanterre (UPN)-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Économie d'Orleans (LEO), Université d'Orléans (UO)-Université de Tours (UT), University of Orleans, LEO, Laboratoire d'économie d'Orleans (LEO), Université d'Orléans (UO)-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Credit default swap
Computer science Decision tree Context (language use) Credit scoring 02 engineering and technology Logistic regression Machine learning computer.software_genre 01 natural sciences [SHS]Humanities and Social Sciences Machine Learning 010104 statistics & probability 0202 electrical engineering electronic engineering information engineering Econometrics Interpretability 0101 mathematics [SHS.ECO] Humanities and Social Sciences/Economics and Finance business.industry Econo- metrics [SHS.ECO]Humanities and Social Sciences/Economics and Finance Ensemble learning Random forest Risk management 020201 artificial intelligence & image processing Artificial intelligence [SHS] Humanities and Social Sciences business computer Credit risk |
Popis: | In the context of credit scoring, ensemble methods based on decision trees, such as the random forest method, provide better classification performance than standard logistic regression models. However, logistic regression remains the benchmark in the credit risk industry mainly because the lack of interpretability of ensemble methods is incompatible with the requirements of financial regulators. In this paper, we pro-pose to obtain the best of both worlds by introducing a high-performance and interpretable credit scoring method called penalised logistic tree regression (PLTR), which uses information from decision trees to improve the performance of logistic regression. Formally, rules extracted from various short-depth decision trees built with pairs of predictive variables are used as predictors in a penalised logistic regression model. PLTR allows us to capture non-linear effects that can arise in credit scoring data while preserving the intrinsic interpretability of the logistic regression model. Monte Carlo simulations and empirical applications using four real credit default datasets show that PLTR predicts credit risk significantly more accurately than logistic regression and compares competitively to the random forest method. JEL Classification: G10 C25, C53 |
Databáze: | OpenAIRE |
Externí odkaz: |