Learning comprehensible and accurate hybrid trees

Autor: Matjaž Gams, Rok Piltaver, Mitja Luštrek, Martin Gjoreski, Sašo Džeroski
Rok vydání: 2021
Předmět:
Zdroj: Expert Systems with Applications. 164:113980
ISSN: 0957-4174
DOI: 10.1016/j.eswa.2020.113980
Popis: Finding the best classifiers according to different criteria is often performed by a multi-objective machine learning algorithm. This study considers two criteria that are usually treated as the most important when deciding which classifier to apply in practice: comprehensibility and accuracy. A model that offers a broad range of trade-offs between the two criteria is introduced because they conflict; i.e., increasing one decreases the other. The choice of the model is motivated by the fact that domain experts often formalize decisions based on knowledge that can be represented by comprehensible rules and some tacit knowledge. This approach is mimicked by a hybrid tree that consists of comprehensible parts that originate from a regular classification tree and incomprehensible parts that originate from an accurate black-box classifier. An empirical evaluation on 23 UCI datasets shows that the hybrid trees provide trade-offs between the accuracy and comprehensibility that are not possible using traditional machine learning models. A corresponding hybrid-tree comprehensibility metric is also proposed. Furthermore, the paper presents a novel algorithm for learning MAchine LeArning Classifiers with HybrId TrEes (MALACHITE), and it proves that the algorithm finds a complete set of nondominated hybrid trees with regard to their accuracy and comprehensibility. The algorithm is shown to be faster than the well-known multi-objective evolutionary optimization algorithm NSGA-II for trees with moderate size, which is a prerequisite for comprehensibility. On the other hand, the MALACHITE algorithm can generate considerably larger hybrid-trees than a naive exhaustive search algorithm in a reasonable amount of time. In addition, an interactive iterative data mining process based on the algorithm is proposed that enables inspection of the Pareto set of hybrid trees. In each iteration, the domain expert analyzes the current set of nondominated hybrid trees, infers domain relations, and sets the parameters for the next machine learning step accordingly.
Databáze: OpenAIRE