Learning comprehensible and accurate hybrid trees
Autor: | Matjaž Gams, Rok Piltaver, Mitja Luštrek, Martin Gjoreski, Sašo Džeroski |
---|---|
Rok vydání: | 2021 |
Předmět: |
0209 industrial biotechnology
Computer science business.industry Decision tree learning General Engineering Pareto principle 02 engineering and technology Machine learning computer.software_genre Computer Science Applications Subject-matter expert 020901 industrial engineering & automation Artificial Intelligence 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Classifier (UML) |
Zdroj: | Expert Systems with Applications. 164:113980 |
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2020.113980 |
Popis: | Finding the best classifiers according to different criteria is often performed by a multi-objective machine learning algorithm. This study considers two criteria that are usually treated as the most important when deciding which classifier to apply in practice: comprehensibility and accuracy. A model that offers a broad range of trade-offs between the two criteria is introduced because they conflict; i.e., increasing one decreases the other. The choice of the model is motivated by the fact that domain experts often formalize decisions based on knowledge that can be represented by comprehensible rules and some tacit knowledge. This approach is mimicked by a hybrid tree that consists of comprehensible parts that originate from a regular classification tree and incomprehensible parts that originate from an accurate black-box classifier. An empirical evaluation on 23 UCI datasets shows that the hybrid trees provide trade-offs between the accuracy and comprehensibility that are not possible using traditional machine learning models. A corresponding hybrid-tree comprehensibility metric is also proposed. Furthermore, the paper presents a novel algorithm for learning MAchine LeArning Classifiers with HybrId TrEes (MALACHITE), and it proves that the algorithm finds a complete set of nondominated hybrid trees with regard to their accuracy and comprehensibility. The algorithm is shown to be faster than the well-known multi-objective evolutionary optimization algorithm NSGA-II for trees with moderate size, which is a prerequisite for comprehensibility. On the other hand, the MALACHITE algorithm can generate considerably larger hybrid-trees than a naive exhaustive search algorithm in a reasonable amount of time. In addition, an interactive iterative data mining process based on the algorithm is proposed that enables inspection of the Pareto set of hybrid trees. In each iteration, the domain expert analyzes the current set of nondominated hybrid trees, infers domain relations, and sets the parameters for the next machine learning step accordingly. |
Databáze: | OpenAIRE |
Externí odkaz: |