An Improved Artificial Bee Colony for Feature Selection in QSAR
Autor: | Xiaolin Li, Jing Wang, Shiguo Huang, Yuanzi Zhang, Yanhong Lin |
---|---|
Rok vydání: | 2021 |
Předmět: |
Quantitative structure–activity relationship
lcsh:T55.4-60.8 Computer science Crossover Feature selection 02 engineering and technology lcsh:QA75.5-76.95 Theoretical Computer Science 03 medical and health sciences feature selection quantitative structure–activity relationship 0202 electrical engineering electronic engineering information engineering Feature (machine learning) lcsh:Industrial engineering. Management engineering artificial bee colony algorithm 030304 developmental biology Interpretability Continuous optimization 0303 health sciences Numerical Analysis business.industry Pattern recognition Artificial bee colony algorithm Computational Mathematics Computational Theory and Mathematics Multicollinearity 020201 artificial intelligence & image processing lcsh:Electronic computers. Computer science Artificial intelligence business |
Zdroj: | Algorithms Volume 14 Issue 4 Algorithms, Vol 14, Iss 120, p 120 (2021) |
ISSN: | 1999-4893 |
DOI: | 10.3390/a14040120 |
Popis: | Quantitative Structure–Activity Relationship (QSAR) aims to correlate molecular structure properties with corresponding bioactivity. Chance correlations and multicollinearity are two major problems often encountered when generating QSAR models. Feature selection can significantly improve the accuracy and interpretability of QSAR by removing redundant or irrelevant molecular descriptors. An artificial bee colony algorithm (ABC) that mimics the foraging behaviors of honey bee colony was originally proposed for continuous optimization problems. It has been applied to feature selection for classification but seldom for regression analysis and prediction. In this paper, a binary ABC algorithm is used to select features (molecular descriptors) in QSAR. Furthermore, we propose an improved ABC-based algorithm for feature selection in QSAR, namely ABC-PLS-1. Crossover and mutation operators are introduced to employed bee and onlooker bee phase to modify several dimensions of each solution, which not only saves the process of converting continuous values into discrete values, but also reduces the computational resources. In addition, a novel greedy selection strategy which selects the feature subsets with higher accuracy and fewer features helps the algorithm to converge fast. Three QSAR datasets are used for the evaluation of the proposed algorithm. Experimental results show that ABC-PLS-1 outperforms PSO-PLS, WS-PSO-PLS, and BFDE-PLS in accuracy, root mean square error, and the number of selected features. Moreover, we also study whether to implement scout bee phase when tracking regression problems and drawing such an interesting conclusion that the scout bee phase is redundant when dealing with the feature selection in low-dimensional and medium-dimensional regression problems. |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |