Chemoinformatics and Machine Learning Approaches for Identifying Antiviral Compounds.

Autor: John L; Centre for Molecular Modeling, CSIR-Indian Institute of Chemical Technology, Tarnaka, Hyderabad, 500 007, India.; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India., Soujanya Y; Centre for Molecular Modeling, CSIR-Indian Institute of Chemical Technology, Tarnaka, Hyderabad, 500 007, India.; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India., Mahanta HJ; Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India.; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India., Narahari Sastry G; Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India.; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India.
Jazyk: angličtina
Zdroj: Molecular informatics [Mol Inform] 2022 Apr; Vol. 41 (4), pp. e2100190. Date of Electronic Publication: 2021 Nov 23.
DOI: 10.1002/minf.202100190
Abstrakt: Current pandemics propelled research efforts in unprecedented fashion, primarily triggering computational efforts towards new vaccine and drug development as well as drug repurposing. There is an urgent need to design novel drugs with targeted biological activity and minimum adverse reactions that may be useful to manage viral outbreaks. Hence an attempt has been made to develop Machine Learning based predictive models that can be used to assess whether a compound has the potency to be antiviral or not. To this end, a set of 2358 antiviral compounds were compiled from the CAS COVID-19 antiviral SAR dataset whose activity was reported based on IC 50 value. A total 1157 two-dimensional molecular descriptors were computed among which, the most highly correlated descriptors were selected using Tree-based, Correlation-based and Mutual information-based feature selection methods. Seven Machine Learning algorithms i. e., Random Forest, XGBoost, Support Vector Machine, KNN, Decision Tree, MLP Classifier and Logistic Regression were benchmarked. The best performance was achieved by the models developed using Random Forest and XGBoost algorithms in all the feature selection methods. The maximum predictive accuracy of both these models was 88 % with internal validation. Whereas, with an external dataset, a maximum accuracy of 93.10 % for XGBoost and 100 % for Random Forest based model was achievable. Furthermore, the study demonstrated scaffold analysis of the molecules as a pragmatic approach to explore the importance of structurally diverse compounds in data driven studies.
(© 2021 Wiley-VCH GmbH.)
Databáze: MEDLINE