SecProMTB: Support Vector Machine-Based Classifier for Secretory Proteins Using Imbalanced Data Sets Applied to Mycobacterium tuberculosis
Autor: | Chaolu Meng, Quan Zou, Leyi Wei |
---|---|
Rok vydání: | 2019 |
Předmět: |
Support Vector Machine
Computer science Feature selection Overfitting Biochemistry Imbalanced data Mycobacterium tuberculosis 03 medical and health sciences Bacterial Proteins Databases Protein Molecular Biology 030304 developmental biology 0303 health sciences biology business.industry 030302 biochemistry & molecular biology Computational Biology Pattern recognition biology.organism_classification Original data Support vector machine Test set Artificial intelligence business Classifier (UML) Algorithms |
Zdroj: | Proteomics. 19(17) |
ISSN: | 1615-9861 |
Popis: | Secretory proteins of Mycobacterium tuberculosis have created more concern, given their dominant immunogenicity and role in pathogenesis. In view of expensive and time-consuming traditional biochemical experiments, an advanced support vector machine model named SecProMTB is constructed in this study and the proteins are identified by a bioinformatic approach. First, an improved pseudo-amino acid composition (PseAAC) algorithm is used to extract features from all entities. Second, a novel imbalanced-data strategy is proposed and adopted to divide the original data set into train set and test set. Third, to overcome the overfitting problem, feature-ranking algorithms are applied with an increment feature selection. Finally, the model is trained and optimized. Consequently, a model is obtained with an area under the curve of 0.862 and average accuracy of 86% in the independent test. For the convenience of users, SecProMTB and related data are openly accessible at http://server.malab.cn/SecProMTB/index.jsp. |
Databáze: | OpenAIRE |
Externí odkaz: |