Signal-3L 2.0: A Hierarchical Mixture Model for Enhancing Protein Signal Peptide Prediction by Incorporating Residue-Domain Cross-Level Features
Autor: | Hong-Bin Shen, Yi-Ze Zhang |
---|---|
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Signal peptide Support Vector Machine General Chemical Engineering Protein domain Computational biology Library and Information Sciences Biology Protein Sorting Signals Machine learning computer.software_genre 03 medical and health sciences Protein Domains Amino Acid Sequence Integral membrane protein Peptide sequence 030102 biochemistry & molecular biology business.industry Computational Biology General Chemistry Mixture model Computer Science Applications Transmembrane domain 030104 developmental biology Secretory protein Artificial intelligence business computer |
Zdroj: | Journal of chemical information and modeling. 57(4) |
ISSN: | 1549-960X |
Popis: | Signal peptides play key roles in targeting and translocation of integral membrane proteins and secretory proteins. However, signal peptides present several challenges for automatic prediction methods. One challenge is that it is difficult to discriminate signal peptides from transmembrane helices, as both the H-region of the peptides and the transmembrane helices are hydrophobic. Another is that it is difficult to identify the cleavage site between signal peptides and mature proteins, as cleavage motifs or patterns are still unclear for most proteins. To solve these problems and further enhance automatic signal peptide recognition, we report a new Signal-3L 2.0 predictor. Our new model is constructed with a hierarchical protocol, where it first determines the existence of a signal peptide. For this, we propose a new residue-domain cross-level feature-driven approach, and we demonstrate that protein functional domain information is particularly useful for discriminating between the transmembrane helices and signal peptides as they perform different functions. Next, in order to accurately identify the unique signal peptide cleavage sites along the sequence, we designed a top-down approach where a subset of potential cleavage sites are screened using statistical learning rules, and then a final unique site is selected according to its evolution conservation score. Because this mixed approach utilizes both statistical learning and evolution analysis, it shows a strong capacity for recognizing cleavage sites. Signal-3L 2.0 has been benchmarked on multiple data sets, and the experimental results have demonstrated its accuracy. The online server is available at www.csbio.sjtu.edu.cn/bioinf/Signal-3L/ . |
Databáze: | OpenAIRE |
Externí odkaz: |