Identification of transcribed protein coding sequence remnants within lincRNAs
Autor: | Miguel A. Andrade-Navarro, Sweta Talyan, Enrique M. Muro |
---|---|
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Transposable element Sequence analysis Pseudogene Retrotransposon Computational biology Biology Open Reading Frames 03 medical and health sciences 0302 clinical medicine Intergenic region Sequence Analysis Protein Genetics Humans Amino Acid Sequence Gene Regulation of gene expression Base Sequence Sequence Analysis RNA Computational Biology 030104 developmental biology Gene Expression Regulation DNA Intergenic RNA Long Noncoding Sequence Alignment Algorithms 030217 neurology & neurosurgery Biogenesis |
Zdroj: | Nucleic Acids Research |
ISSN: | 1362-4962 0305-1048 |
DOI: | 10.1093/nar/gky608 |
Popis: | Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. However, very little is known of the mechanisms for the evolutionary biogenesis of these RNA elements, especially given their poor conservation across species. It has been proposed that lincRNAs might arise from pseudogenes. To test this systematically, we developed a novel method that searches for remnants of protein-coding sequences within lincRNA transcripts; the hypothesis is that we can trace back their biogenesis from protein-coding genes or posterior transposon/retrotransposon insertions. Applying this method, we found 203 human lincRNA genes with regions significantly similar to protein-coding sequences. Our method provides a visualization tool to trace the evolutionary biogenesis of lincRNAs with respect to protein-coding genes by sequence divergence. Subsequently, we show the expression correlation between lincRNAs and their identified parental protein-coding genes using public RNA-seq repositories, hinting at novel gene regulatory relationships. In summary, we developed a novel computational methodology to study non-coding gene sequences, which can be applied to identify the evolutionary biogenesis and function of lincRNAs. |
Databáze: | OpenAIRE |
Externí odkaz: |