Disease Probability Index (DPI, χ): A new alignment-free scoring method to evaluate the propensities of polypeptide sequences leading to disease onset

Autor:	Ananya Ali, Angshuman Bagchi
Rok vydání:	2018
Předmět:	0301 basic medicine Statistics and Probability Disease onset Databases Factual Sequence analysis Computational biology Biology General Biochemistry Genetics and Molecular Biology Late Onset Disorders 03 medical and health sciences Single species Sequence Analysis Protein Humans Single amino acid Amino Acids Protein length Probability Sequence (medicine) chemistry.chemical_classification Polymorphism Genetic Training set Applied Mathematics Computational Biology General Medicine Amino acid 030104 developmental biology chemistry Modeling and Simulation Peptides Sequence Alignment Algorithms Software
Zdroj:	Biosystems. 172:1-8
ISSN:	0303-2647
DOI:	10.1016/j.biosystems.2018.06.001
Popis:	The analyses of the amino acid sequences of proteins provide valuable information regarding the structure and function of the protein. A comparatively new approach is the alignment-free sequence comparisons. To-date most, if not all, sequence analysis techniques are used to find out the sequence homologies to measure the evolutionary relatedness among the species. However, a still untouched avenue in the field of sequence analyses is to build a comparative estimate of the sequence similarities between unrelated protein sequences from and within a single species. In this work, we tried to develop an alignment-free scoring method to study sequences from different proteins belonging to humans to identify the disease-associations of the sequences. A total of 52 protein sequences were analyzed. There were 599 reported polymorphic sites and 802 (708 polymorphic and 94 disease-associated) Single Amino acid Variants (SAVs) in the training data set. For cross-validation purposes, another set of 62 protein sequences (26 enzymes, 16 Membrane-bound Enzymes and 20 Membrane-bound Proteins), with a total of 261 reported polymorphic sites and 799 (291 polymorphic and 508 disease-associated) SAVs, were used. A negative correlation was observed for both training and cross-validation data set between percentage of reported disease-associated SAVs with a ratio of (polymorphic site : protein length). A new scoring pattern was also developed that would take into account the ratio of polymorphic site and protein length by counting the number of polymorphic amino acids and the total numbers of amino acids in proteins.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7f2dc610c67aa9e7d4bc686387cfb0c6 https://doi.org/10.1016/j.biosystems.2018.06.001 Zobrazit plný text záznamu Full Text from ScienceDirect