A Salient Ensemble of Trees using Cascaded Linear classifiers with Feature-Cost Constraints

Autor: Chien-Wen Huang, 黃建文
Rok vydání: 2017
Druh dokumentu: 學位論文 ; thesis
Popis: 105
In machine learning field, both feature selection and cost-sensitive model training are widely studied. The traditional goal of feature selection is to find features with more information and less redundancy, but in many situation the features are not free to use. Moreover, in some application it’s important to guarantee that the model will never run out of cost budget given any testing instance, e.g. real-time application with limited response time. Researchers have tried to model the trade-off between performance and cost, but they often assume that the costs are independent from other features, which is not practical in reality. In this thesis, we model the feature cost as two categories, individual cost and group cost. The individual cost stands for the part that is independent from any other features such as memory. The group cost represents the part that charges only once when any feature in that group is extracted. Moreover, we propose a two-stage algorithm that incorporates both cost-sensitive feature selection and model with a cost budget constraint. We propose a cost-sensitive feature selection algorithm that considers both individual cost and group cost based on the idea of random forest, i.e. group-cost-sensitive random forest (GOAT) algorithm. After the proper feature subset is selected, the proposed algorithm applies the derived features to building a salient ensemble of trees each of which uses cascaded linear classifiers (ETIC). Moreover, the ETIC model is trained with the satisfaction of the feature-cost constraints. Our proposed ETIC model applies multiple features in each node, which is more powerful than traditional random forest that uses only a feature in each node. In the experiment, we compare the results between our proposed algorithm and some baselines using real-data including the user preference data and the object detection data. When the group cost dominates, our GOAT-ETIC model can gain a 10 to 30% improvement over the baseline algorithms.
Databáze: Networked Digital Library of Theses & Dissertations