Cost-sensitive classifier chains: Selecting low-cost features in multi-label classification
Autor: | Damien Zufferey, Paweł Teisseyre, Marta Słomka |
---|---|
Rok vydání: | 2019 |
Předmět: |
Multi-label classification
Computer science business.industry media_common.quotation_subject Stability (learning theory) Feature selection Machine learning computer.software_genre Logistic regression Generalization error Artificial Intelligence Signal Processing Benchmark (computing) Feature (machine learning) Quality (business) Computer Vision and Pattern Recognition Artificial intelligence Medical diagnosis Classifier chains business computer Software media_common |
Zdroj: | Pattern Recognition. 86:290-319 |
ISSN: | 0031-3203 |
DOI: | 10.1016/j.patcog.2018.09.012 |
Popis: | Feature selection is one of the trending challenges in multi-label classification. In recent years a lot of methods have been proposed. However the existing approaches assume that all the features have the same cost. This assumption may be inappropriate when the acquisition of the feature values is costly. For example in medical diagnosis each diagnostic value extracted by a clinical test is associated with its own cost. In such cases it may be better to choose a model with an acceptable classification performance but a much lower cost. We propose a novel method which incorporates the feature cost information into the learning process. The method, named Cost-Sensitive Classifier Chains, combines classifier chains and penalized logistic regression with a modified elastic-net penalty which takes into account costs of the features. We prove the stability and provide a bound on generalization error of our algorithm. We also propose the adaptive version in which penalty factors are changing during fitting the consecutive models in the chain. The methods are applied on real datasets: MIMIC-II and Hepatitis for which the cost information is provided by experts. Moreover, we propose an experimental framework in which the features are observed with measurement errors and the costs depend on the quality of the features. The framework allows to compare the cost-sensitive methods on benchmark datasets for which the cost information is not provided. The proposed method can be recommended in a situation when one wants to balance low costs and high prediction performance. |
Databáze: | OpenAIRE |
Externí odkaz: |