MurTree: Optimal Decision Trees via Dynamic Programming and Search

Autor: Demirović, E., Lukina, A., Hebrard, E., Jeffrey Chan, Bailey, J., Leckie, C., Ramamohanarao, K., Stuckey, P. J.
Přispěvatelé: Delft University of Technology (TU Delft), Équipe Recherche Opérationnelle, Optimisation Combinatoire et Contraintes (LAAS-ROC), Laboratoire d'analyse et d'architecture des systèmes (LAAS), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Université de Toulouse (UT)-Institut National des Sciences Appliquées (INSA)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Royal Melbourne Institute of Technology University (RMIT University), University of Melbourne, Monash University [Melbourne]
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Journal of Machine Learning Research
Journal of Machine Learning Research, 2022, 23 (26), pp.1-47
Scopus-Elsevier
Journal of Machine Learning Research, 23(26)
ISSN: 1532-4435
1533-7928
Popis: International audience; Decision tree learning is a widely used approach in machine learning, favoured in applications that require concise and interpretable models. Heuristic methods are traditionally used to quickly produce models with reasonably high accuracy. A commonly criticised point, however, is that the resulting trees may not necessarily be the best representation of the data in terms of accuracy and size. In recent years, this motivated the development of optimal classification tree algorithms that globally optimise the decision tree in contrast to heuristic methods that perform a sequence of locally optimal decisions. We follow this line of work and provide a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our algorithm supports constraints on the depth of the tree and number of nodes. The success of our approach is attributed to a series of specialised techniques that exploit properties unique to classification trees. Whereas algorithms for optimal classification trees have traditionally been plagued by high runtimes and limited scalability, we show in a detailed experimental study that our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances, providing several orders of magnitude improvements and notably contributing towards the practical use of optimal decision trees.
Databáze: OpenAIRE