Disease Probability Index (DPI, χ): A new alignment-free scoring method to evaluate the propensities of polypeptide sequences leading to disease onset
Autor: | Ananya Ali, Angshuman Bagchi |
---|---|
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Statistics and Probability Disease onset Databases Factual Sequence analysis Computational biology Biology General Biochemistry Genetics and Molecular Biology Late Onset Disorders 03 medical and health sciences Single species Sequence Analysis Protein Humans Single amino acid Amino Acids Protein length Probability Sequence (medicine) chemistry.chemical_classification Polymorphism Genetic Training set Applied Mathematics Computational Biology General Medicine Amino acid 030104 developmental biology chemistry Modeling and Simulation Peptides Sequence Alignment Algorithms Software |
Zdroj: | Biosystems. 172:1-8 |
ISSN: | 0303-2647 |
DOI: | 10.1016/j.biosystems.2018.06.001 |
Popis: | The analyses of the amino acid sequences of proteins provide valuable information regarding the structure and function of the protein. A comparatively new approach is the alignment-free sequence comparisons. To-date most, if not all, sequence analysis techniques are used to find out the sequence homologies to measure the evolutionary relatedness among the species. However, a still untouched avenue in the field of sequence analyses is to build a comparative estimate of the sequence similarities between unrelated protein sequences from and within a single species. In this work, we tried to develop an alignment-free scoring method to study sequences from different proteins belonging to humans to identify the disease-associations of the sequences. A total of 52 protein sequences were analyzed. There were 599 reported polymorphic sites and 802 (708 polymorphic and 94 disease-associated) Single Amino acid Variants (SAVs) in the training data set. For cross-validation purposes, another set of 62 protein sequences (26 enzymes, 16 Membrane-bound Enzymes and 20 Membrane-bound Proteins), with a total of 261 reported polymorphic sites and 799 (291 polymorphic and 508 disease-associated) SAVs, were used. A negative correlation was observed for both training and cross-validation data set between percentage of reported disease-associated SAVs with a ratio of (polymorphic site : protein length). A new scoring pattern was also developed that would take into account the ratio of polymorphic site and protein length by counting the number of polymorphic amino acids and the total numbers of amino acids in proteins. |
Databáze: | OpenAIRE |
Externí odkaz: |