Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method
Autor: | Yuhua Yao, Huimin Xu, Binbin Ji, Nan Xuying, Jing Chen, Ya-ping Lv, Bo Liao, Ling Li, Chun Li |
---|---|
Rok vydání: | 2019 |
Předmět: |
Support Vector Machine
Physicochemical properties Computer science Principal component analysis Information Storage and Retrieval Computational biology lcsh:Computer applications to medicine. Medical informatics computer.software_genre Biochemistry 03 medical and health sciences chemistry.chemical_compound 0302 clinical medicine Protein sequencing Dimension (vector space) Structural Biology Amino Acid Sequence Databases Protein lcsh:QH301-705.5 Molecular Biology 030304 developmental biology 0303 health sciences Dipeptide Research Applied Mathematics Computational Biology Proteins Dipeptides Subcellular localization Computer Science Applications Support vector machine Information extraction Statistical classification lcsh:Biology (General) chemistry k-mer lcsh:R858-859.7 Position-specific score matrix Gene ontology DNA microarray computer Algorithms 030217 neurology & neurosurgery Subcellular Fractions |
Zdroj: | BMC Bioinformatics BMC Bioinformatics, Vol 20, Iss S22, Pp 1-8 (2019) |
ISSN: | 1471-2105 |
DOI: | 10.1186/s12859-019-3232-4 |
Popis: | Background Subcellular localization prediction of protein is an important component of bioinformatics, which has great importance for drug design and other applications. A multitude of computational tools for proteins subcellular location have been developed in the recent decades, however, existing methods differ in the protein sequence representation techniques and classification algorithms adopted. Results In this paper, we firstly introduce two kinds of protein sequences encoding schemes: dipeptide information with space and Gapped k-mer information. Then, the Gapped k-mer calculation method which is based on quad-tree is also introduced. Conclusions >From the prediction results, this method not only reduces the dimension, but also improves the prediction precision of protein subcellular localization. |
Databáze: | OpenAIRE |
Externí odkaz: |