Popis: |
Hundreds of thousands of putative small ORFs (smORFs) sequences are present in eukaryotic genomes, potentially coding for peptides less than 100 amino acids. smORFs have been deemed non-coding on the basis of their high numbers and their small size that makes it extremely challenging to assess their functionality both bioinformatically and biochemically. The recently developed Ribo-Seq technique, which is the deep sequencing of ribosome footprints, has generated significant controversy by showing extensive translation of smORFs outside of annotated protein coding regions, including putative non-coding RNAs.. Our lab adapted the Ribo-Seq technique by combining it with the polysome fractionation in order to assess smORF translation in Drosophila S2 cells. This thesis provides a high-throughput assessment of smORF translation in Drosophila melanogaster by firstly implementing complementary techniques such as transfection-tagging and Mass spectrometry methods in order to provide an independent corroboration of the S2 cell data (Chapter 3). Secondly, the in order to expand the catalogue of smORFs that are translated, I significantly improve upon the yield and sequencing efficiency of the Poly-Ribo-Seq protocol while adapting it to Drosophila embryos and then implementing it across embryogenesis divided in to Early, Mid and Late stages (Chapter 4). Currently, there is still a lot of debate in the field with regards to Ribo-Seq data analysis, and various computational metrics have been developed aimed at discerning ‘real’ translation events to background noise. Chapter 5 explores some of the metrics developed and establishes a translation cut-off suitable for designating small ORFs as translated. Altogether, the improvements introduced to the protocol and my data analysis shows the translation of 500 annotated smORFs, 500 smORFs in long non-coding RNAs and 5,000 uORFs, of which only one-third of each type of smORF has previous evidence of translation. These findings strengthen the establishment of smORFs as a distinct class of genes that significantly expand the protein coding complement of the genome. |