On comparing composition principles of long DNA sequences with those of random ones
Autor: | Sergey V. Petoukhov, Markus Gumbel, Elena Fimmel, Ali Karpuzoglu |
---|---|
Rok vydání: | 2019 |
Předmět: |
Statistics and Probability
Transcription Genetic Generalization General Biochemistry Genetics and Molecular Biology DNA sequencing Evolution Molecular Combinatorics 03 medical and health sciences chemistry.chemical_compound 0302 clinical medicine Chargaff's rules Animals Humans Selection Genetic 030304 developmental biology Mathematics Base Composition 0303 health sciences Genome Models Genetic Genome Human Applied Mathematics DNA Genomics Sequence Analysis DNA General Medicine Composition (combinatorics) Markov Chains chemistry Modeling and Simulation GenBank Algorithms 030217 neurology & neurosurgery |
Zdroj: | Biosystems. 180:101-108 |
ISSN: | 0303-2647 |
DOI: | 10.1016/j.biosystems.2019.04.003 |
Popis: | The revelation of compositional principles of the organization of long DNA sequences is one of the crucial tasks in the study of biosystems. This paper is devoted to the analysis of compositional differences between real DNA sequences and Markov-like randomly generated similar sequences. We formulate, among other things, a generalization of Chargaff's second rule and verify it empirically on DNA sequences of five model organisms taken from Genbank. Moreover, we apply the same frequency analysis to simulated sequences. When comparing the afore mentioned – real and random – sequences, significant similarities, on the one hand, as well as essential differences between them, on the other hand, are revealed and described. The significance and possible origin of these differences, including those from the viewpoint of maximum informativeness of genetic texts, is discussed. Besides, the paper discusses the question of what is a “long” DNA sequence and quantifies the choice of length. More precisely, the standard deviations of relative frequencies of bases stabilize from the length of approximately 100 000 bases, whereas the deviations are about three times as large at the length of approximately 25 000 bases. |
Databáze: | OpenAIRE |
Externí odkaz: |