Disease Probability Index (DPI, χ): A new alignment-free scoring method to evaluate the propensities of polypeptide sequences leading to disease onset

Autor: Ananya Ali, Angshuman Bagchi
Rok vydání: 2018
Předmět:
Zdroj: Biosystems. 172:1-8
ISSN: 0303-2647
DOI: 10.1016/j.biosystems.2018.06.001
Popis: The analyses of the amino acid sequences of proteins provide valuable information regarding the structure and function of the protein. A comparatively new approach is the alignment-free sequence comparisons. To-date most, if not all, sequence analysis techniques are used to find out the sequence homologies to measure the evolutionary relatedness among the species. However, a still untouched avenue in the field of sequence analyses is to build a comparative estimate of the sequence similarities between unrelated protein sequences from and within a single species. In this work, we tried to develop an alignment-free scoring method to study sequences from different proteins belonging to humans to identify the disease-associations of the sequences. A total of 52 protein sequences were analyzed. There were 599 reported polymorphic sites and 802 (708 polymorphic and 94 disease-associated) Single Amino acid Variants (SAVs) in the training data set. For cross-validation purposes, another set of 62 protein sequences (26 enzymes, 16 Membrane-bound Enzymes and 20 Membrane-bound Proteins), with a total of 261 reported polymorphic sites and 799 (291 polymorphic and 508 disease-associated) SAVs, were used. A negative correlation was observed for both training and cross-validation data set between percentage of reported disease-associated SAVs with a ratio of (polymorphic site : protein length). A new scoring pattern was also developed that would take into account the ratio of polymorphic site and protein length by counting the number of polymorphic amino acids and the total numbers of amino acids in proteins.
Databáze: OpenAIRE