Sequence-based predictive modeling to identify cancerlectins
Autor: | Wei Chen, Hong-Yan Lai, Hua Tang, Hao Lin, Xin-Xin Chen |
---|---|
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Support Vector Machine SVM Genomics Machine learning computer.software_genre 03 medical and health sciences Chen cancerlectins Lectins Neoplasms Humans Amino Acid Sequence binomial distribution Databases Protein Sequence biology business.industry optimal tripeptides Reproducibility of Results Feature description biology.organism_classification Support vector machine Binomial distribution 030104 developmental biology ROC Curve Oncology Christian ministry Artificial intelligence business computer Jackknife resampling Algorithms Research Paper |
Zdroj: | Oncotarget |
ISSN: | 1949-2553 |
DOI: | 10.18632/oncotarget.15963 |
Popis: | // Hong-Yan Lai 1 , Xin-Xin Chen 1 , Wei Chen 1, 2 , Hua Tang 3 , Hao Lin 1 1 Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China 2 Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan, China 3 Department of Pathophysiology, Southwest Medical University, Luzhou, China Correspondence to: Hua Tang, email: Tanghua771211@aliyun.com Hao Lin, email: hlin@uestc.edu.cn Keywords: cancerlectins, binomial distribution, optimal tripeptides, SVM Received: January 18, 2017 Accepted: February 24, 2017 Published: March 07, 2017 ABSTRACT Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools. |
Databáze: | OpenAIRE |
Externí odkaz: |