Biclustering as Strategy for Improving Feature Selection in Consensus QSAR Modeling
Autor: | Julieta Sol Dussaut, María Jimena Martínez, Ignacio Ponzoni |
---|---|
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Quantitative structure–activity relationship Generalization business.industry Applied Mathematics Feature selection 02 engineering and technology Machine learning computer.software_genre Chemical space Biclustering 03 medical and health sciences 030104 developmental biology Cheminformatics 0202 electrical engineering electronic engineering information engineering Discrete Mathematics and Combinatorics 020201 artificial intelligence & image processing Artificial intelligence High dimensionality business computer Mathematics Interpretability |
Zdroj: | Electronic Notes in Discrete Mathematics. 69:117-124 |
ISSN: | 1571-0653 |
Popis: | Feature selection applied to QSAR (Quantitative Structure-Activity Relationship) modeling is a challenging combinatorial optimization problem due to the high dimensionality of the chemical space associated with molecules and the complexity of the physicochemical properties usually studied in Cheminformatics. This derives commonly in classification models with a large number of variables, decreasing the generalization and interpretability of these classifiers. In this paper, a novel strategy based on biclustering analysis is proposed for addressing this problem. The new method is applied as a post-processing step for feature selection outputs generated by consensus feature selection methods. The approach was evaluated using datasets oriented to ready biodegradation prediction of chemical compounds. These preliminary results show that biclustering can help to identify features with low class-discrimination power, which it is useful for reducing the complexity of QSAR models without losing prediction accuracy. |
Databáze: | OpenAIRE |
Externí odkaz: |