Predicting HIV drug resistance using weighted machine learning method at target protein sequence-level

Autor: Rongao Yuan, Menglong Li, Yanzhi Guo, Qihang Cai, Jian He
Rok vydání: 2021
Předmět:
Zdroj: Molecular Diversity. 25:1541-1551
ISSN: 1573-501X
1381-1991
Popis: Acquired immune deficiency syndrome (AIDS) is a fatal disease caused by human immunodeficiency virus (HIV). Although 23 different drugs have been available, the treatment of AIDS remains challenging because the virus mutates very quickly which can lead to drug resistance. Therefore, predicting drug resistance before treatment is crucial for individual treatments. Here, based on HIV target protein sequence information, we analyzed 21-drug resistance caused by mutated residues using machine learning (ML) methods. To transform target sequences into numeric vectors, seven physicochemical properties were used, which can well represent the interacting characteristics of target proteins. Then, principal component analysis (PCA) method was adopted to reduce the feature dimensionality. Random forest (RF) and support vector machine (SVM) based on three different kernel functions, including linear, polynomial and radial basis function (RBF), were all employed. By comparisons, we found that RBF-based SVM method gives a comparative performance with RF model. Further, we added the weight information to RBF-based SVM method by four different weight evaluation methods of RF, eXtreme Gradient Boosting (XGB), CfsSubsetEval and ReliefFAttributeEval, respectively. Results show that the RF-weighted RBF-based SVM yield the superior performance and 13 out of 21 drug models provide the correlation coefficients (R2) over 0.8 and 3 of them are higher than 0.9. Finally, position-specific importance analysis indicates that most of the mutation residues with high RF weight scores are proved to be closely related with drug resistance, which has been revealed in previous reports. Overall, we can expect that this method can be a supplementary tool for predicting HIV drug resistance for newly discovered mutations. Here, based on HIV target protein sequence information, we analyzed 21-drug resistance caused by mutated residues using machine learning (ML) methods by fusing the weight information of different mutation positions.
Databáze: OpenAIRE