Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method

Autor: Yuhua Yao, Huimin Xu, Binbin Ji, Nan Xuying, Jing Chen, Ya-ping Lv, Bo Liao, Ling Li, Chun Li
Rok vydání: 2019
Předmět:
Support Vector Machine
Physicochemical properties
Computer science
Principal component analysis
Information Storage and Retrieval
Computational biology
lcsh:Computer applications to medicine. Medical informatics
computer.software_genre
Biochemistry
03 medical and health sciences
chemistry.chemical_compound
0302 clinical medicine
Protein sequencing
Dimension (vector space)
Structural Biology
Amino Acid Sequence
Databases
Protein

lcsh:QH301-705.5
Molecular Biology
030304 developmental biology
0303 health sciences
Dipeptide
Research
Applied Mathematics
Computational Biology
Proteins
Dipeptides
Subcellular localization
Computer Science Applications
Support vector machine
Information extraction
Statistical classification
lcsh:Biology (General)
chemistry
k-mer
lcsh:R858-859.7
Position-specific score matrix
Gene ontology
DNA microarray
computer
Algorithms
030217 neurology & neurosurgery
Subcellular Fractions
Zdroj: BMC Bioinformatics
BMC Bioinformatics, Vol 20, Iss S22, Pp 1-8 (2019)
ISSN: 1471-2105
DOI: 10.1186/s12859-019-3232-4
Popis: Background Subcellular localization prediction of protein is an important component of bioinformatics, which has great importance for drug design and other applications. A multitude of computational tools for proteins subcellular location have been developed in the recent decades, however, existing methods differ in the protein sequence representation techniques and classification algorithms adopted. Results In this paper, we firstly introduce two kinds of protein sequences encoding schemes: dipeptide information with space and Gapped k-mer information. Then, the Gapped k-mer calculation method which is based on quad-tree is also introduced. Conclusions >From the prediction results, this method not only reduces the dimension, but also improves the prediction precision of protein subcellular localization.
Databáze: OpenAIRE