Entropy and long-range correlations in DNA sequences

Autor: S.S. Melnik, O. V. Usatenko
Rok vydání: 2014
Předmět:
Entropy
Molecular Sequence Data
Binary number
FOS: Physical sciences
Condensed Matter - Soft Condensed Matter
Biochemistry
Differential entropy
Structural Biology
Ergodic theory
Entropy (information theory)
Animals
Additive Markov chain
Statistical physics
Condensed Matter - Statistical Mechanics
Mathematics
Genome
Markov chain
Statistical Mechanics (cond-mat.stat-mech)
Base Sequence
Organic Chemistry
Conditional probability
Chromosome Mapping
Probability and statistics
Sequence Analysis
DNA

Other Quantitative Biology (q-bio.OT)
Quantitative Biology - Other Quantitative Biology
Markov Chains
Computational Mathematics
Drosophila melanogaster
FOS: Biological sciences
Physics - Data Analysis
Statistics and Probability

Soft Condensed Matter (cond-mat.soft)
Data Analysis
Statistics and Probability (physics.data-an)

Bacillus subtilis
Zdroj: Computational biology and chemistry.
ISSN: 1476-928X
Popis: We analyze the structure of DNA molecules of different organisms by using the additive Markov chain approach. Transforming nucleotide sequences into binary strings, we perform statistical analysis of the corresponding "texts". We develop the theory of N-step additive binary stationary ergodic Markov chains and analyze their differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain by means of the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses two point correlators instead of probability of block occurring, it makes possible to calculate the entropy of subsequences at much longer distances than with the use of the standard methods. We utilize the obtained analytical result for numerical evaluation of the entropy of coarse-grained DNA texts. We believe that the entropy study can be used for biological classification of living species.
8 pages, 5 figures
Databáze: OpenAIRE