Drosophila 3' UTRs are more complex than protein-coding sequences

Autor: Kerrie Mengersen, Jonathan M. Keith, Edward Tasker, Manjula Algama, Christopher Oldmeadow
Rok vydání: 2013
Předmět:
Untranslated region
010000 MATHEMATICAL SCIENCES
01 natural sciences
010104 statistics & probability
Melanogaster
Transversion
3' Untranslated Regions
Genome Evolution
Genetics
0303 health sciences
Multidisciplinary
Genomics
Functional Genomics
Regulatory sequence
Physical Sciences
Medicine
Drosophila
Sequence databases
Drosophila melanogaster
Multiple alignment calculation
Untranslated regions
Sequence Analysis
Statistics (Mathematics)
Research Article
Genome evolution
Markov Models
Science
Molecular Sequence Data
Sequence alignment
Computational biology
Biology
Biostatistics
Genome Complexity
03 medical and health sciences
Open Reading Frames
Species Specificity
Animals
0101 mathematics
Statistical Methods
Molecular Biology Techniques
Sequencing Techniques
Molecular Biology
060100 BIOCHEMISTRY AND CELL BIOLOGY
030304 developmental biology
Base Sequence
Models
Genetic

Computational Biology
Genetic Variation
Biology and Life Sciences
Bayes Theorem
Comparative Genomics
biology.organism_classification
Genome Analysis
Probability Theory
Sequence motif analysis
GC-content
Mathematics
Zdroj: PLoS ONE
PLoS ONE, Vol 9, Iss 5, p e97336 (2014)
ISSN: 1932-6203
Popis: The 3' UTRs of eukaryotic genes participate in a variety of post-transcriptional (and some transcriptional) regulatory interactions. Some of these interactions are well characterised, but an undetermined number remain to be discovered. While some regulatory sequences in 3' UTRs may be conserved over long evolutionary time scales, others may have only ephemeral functional significance as regulatory profiles respond to changing selective pressures. Here we propose a sensitive segmentation methodology for investigating patterns of composition and conservation in 3' UTRs based on comparison of closely related species. We describe encodings of pairwise and three-way alignments integrating information about conservation, GC content and transition/transversion ratios and apply the method to three closely related Drosophila species: D. melanogaster, D. simulans and D. yakuba. Incorporating multiple data types greatly increased the number of segment classes identified compared to similar methods based on conservation or GC content alone. We propose that the number of segments and number of types of segment identified by the method can be used as proxies for functional complexity. Our main finding is that the number of segments and segment classes identified in 3' UTRs is greater than in the same length of protein-coding sequence, suggesting greater functional complexity in 3' UTRs. There is thus a need for sustained and extensive efforts by bioinformaticians to delineate functional elements in this important genomic fraction. C code, data and results are available upon request.
Databáze: OpenAIRE