A new parameter to study compositional properties of non-coding regions in eukaryotic genomes.

Autor: Bultrini E; Dipartimento di Malattie Infettive, Parassitarie ed Immunomediate, Istituto Superiore di Sanità, Viale Regina Elena, 299, 00161 Roma, Italy., Pizzi E
Jazyk: angličtina
Zdroj: Gene [Gene] 2006 Dec 30; Vol. 385, pp. 75-82. Date of Electronic Publication: 2006 Aug 09.
DOI: 10.1016/j.gene.2006.05.030
Abstrakt: Genomes are characterized by global and local compositional properties that are interesting in an evolutionary perspective but also provide useful information for the identification of some functional elements. Following previous studies, in this work we investigated compositional properties of non-coding sequences in four eukaryotic genomes (C. elegans, D. melanogaster, M. musculus, H. sapiens). We developed a procedure based on Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to identify pentamers that are over-represented in introns (intron vocabulary) and to define a new parameter (LD) that reflects oligonucleotide composition of a given sequence. We analyzed genomic sequences and we found that all non-coding parts of a genome are characterized by similar LD values. Furthermore, we used the new parameter to analyze potentially regulatory regions. We extracted non-redundant sets of promoter sequences for D. melanogaster and H. sapiens and we studied their compositional (G+C content and LD parameter) and conformational (bendability propensity) properties. We found that regions immediately surrounding transcription start sites are distinguishable because of their %G+C, LD and bendability values.
Databáze: MEDLINE