Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects
Autor: | Sessi Tokpavi, Elena Ivona Dumitrescu, Christophe Hurlin, Sullivan Hué |
---|---|
Přispěvatelé: | EconomiX, Université Paris Nanterre (UPN)-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Économie d'Orleans (LEO), Université d'Orléans (UO)-Université de Tours (UT), Aix-Marseille Sciences Economiques (AMSE), École des hautes études en sciences sociales (EHESS)-Aix Marseille Université (AMU)-École Centrale de Marseille (ECM)-Centre National de la Recherche Scientifique (CNRS), Chair ACPR/Risk Foundation: Regulation and Systemic Risk, ANR-16-CE26-0015,MultiRisk,Méthodes Econométriques pour la Modélisation des Risques Multiples(2016), ANR-19-CE26-0002,CaLiBank,L'industrie bancaire de l'après crise : comment les banques vont-elles réagir aux contraintes réglementaires plus strictes ?(2019), École des hautes études en sciences sociales (EHESS)-École Centrale de Marseille (ECM)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU) |
Rok vydání: | 2022 |
Předmět: |
Information Systems and Management
Credit default swap General Computer Science Computer science Decision tree Credit scoring Context (language use) Management Science and Operations Research [SHS.ECO]Humanities and Social Sciences/Economics and Finance Logistic regression Ensemble learning Industrial and Manufacturing Engineering Random forest Risk management Modeling and Simulation Machine learning Statistics Interpretability Econometrics Credit risk |
Zdroj: | European Journal of Operational Research European Journal of Operational Research, Elsevier, 2022, 297 (3), pp.1178-1192. ⟨10.1016/j.ejor.2021.06.053⟩ European Journal of Operational Research, Elsevier, 2022, 297 (3), pp.1178-1192 |
ISSN: | 0377-2217 1872-6860 |
DOI: | 10.1016/j.ejor.2021.06.053 |
Popis: | In the context of credit scoring, ensemble methods based on decision trees, such as the random forest method, provide better classification performance than standard logistic regression models. However, logistic regression remains the benchmark in the credit risk industry mainly because the lack of interpretability of ensemble methods is incompatible with the requirements of financial regulators. In this paper, we propose a high-performance and interpretable credit scoring method called penalised logistic tree regression (PLTR), which uses information from decision trees to improve the performance of logistic regression. Formally, rules extracted from various short-depth decision trees built with original predictive variables are used as predictors in a penalised logistic regression model. PLTR allows us to capture non-linear effects that can arise in credit scoring data while preserving the intrinsic interpretability of the logistic regression model. Monte Carlo simulations and empirical applications using four real credit default datasets show that PLTR predicts credit risk significantly more accurately than logistic regression and compares competitively to the random forest method. |
Databáze: | OpenAIRE |
Externí odkaz: |