An Improved Artificial Bee Colony for Feature Selection in QSAR

Autor: Xiaolin Li, Jing Wang, Shiguo Huang, Yuanzi Zhang, Yanhong Lin
Rok vydání: 2021
Předmět:
Quantitative structure–activity relationship
lcsh:T55.4-60.8
Computer science
Crossover
Feature selection
02 engineering and technology
lcsh:QA75.5-76.95
Theoretical Computer Science
03 medical and health sciences
feature selection
quantitative structure–activity relationship
0202 electrical engineering
electronic engineering
information engineering

Feature (machine learning)
lcsh:Industrial engineering. Management engineering
artificial bee colony algorithm
030304 developmental biology
Interpretability
Continuous optimization
0303 health sciences
Numerical Analysis
business.industry
Pattern recognition
Artificial bee colony algorithm
Computational Mathematics
Computational Theory and Mathematics
Multicollinearity
020201 artificial intelligence & image processing
lcsh:Electronic computers. Computer science
Artificial intelligence
business
Zdroj: Algorithms
Volume 14
Issue 4
Algorithms, Vol 14, Iss 120, p 120 (2021)
ISSN: 1999-4893
DOI: 10.3390/a14040120
Popis: Quantitative Structure–Activity Relationship (QSAR) aims to correlate molecular structure properties with corresponding bioactivity. Chance correlations and multicollinearity are two major problems often encountered when generating QSAR models. Feature selection can significantly improve the accuracy and interpretability of QSAR by removing redundant or irrelevant molecular descriptors. An artificial bee colony algorithm (ABC) that mimics the foraging behaviors of honey bee colony was originally proposed for continuous optimization problems. It has been applied to feature selection for classification but seldom for regression analysis and prediction. In this paper, a binary ABC algorithm is used to select features (molecular descriptors) in QSAR. Furthermore, we propose an improved ABC-based algorithm for feature selection in QSAR, namely ABC-PLS-1. Crossover and mutation operators are introduced to employed bee and onlooker bee phase to modify several dimensions of each solution, which not only saves the process of converting continuous values into discrete values, but also reduces the computational resources. In addition, a novel greedy selection strategy which selects the feature subsets with higher accuracy and fewer features helps the algorithm to converge fast. Three QSAR datasets are used for the evaluation of the proposed algorithm. Experimental results show that ABC-PLS-1 outperforms PSO-PLS, WS-PSO-PLS, and BFDE-PLS in accuracy, root mean square error, and the number of selected features. Moreover, we also study whether to implement scout bee phase when tracking regression problems and drawing such an interesting conclusion that the scout bee phase is redundant when dealing with the feature selection in low-dimensional and medium-dimensional regression problems.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje