Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins

Autor: Xavier Roucou, Jean-François Lucier, Christian R. Landry, Darel J. Hunting, Maxime C. Beaudoin, Benoît Vanderperre, Jules Gagnon, Aïda Ouangraoua, Julie Motard, Isabelle Gagnon-Arsenault, Isabelle Fournier, Mylaine Breton, Annie V Roy, Sondos Samandi, Jean-François Jacques, Alan A. Cohen, Michelle S. Scott, Mylène Brunelle, Vivian Delcourt
Rok vydání: 2017
Předmět:
Zdroj: eLife, Vol 6 (2017)
eLife
ISSN: 2050-084X
Popis: Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins.
eLife digest Proteins are often referred to as the workhorses of the cell, and these molecules affect all aspects of human health and disease. Thus, deciphering the entire set of proteins made by an organism is often an important challenge for biologists. Genes contain the instructions to make a protein, but first they must be copied into a molecule called an mRNA. The part of the mRNA that actually codes for the protein is referred to as an open reading frame (or ORF for short). For many years, most scientists assumed that, except for in bacteria, each mature mRNA in an organism has just a single functional ORF, and that this was generally the longest possible ORF within the mRNA. Many also assumed that RNAs copied from genes that had been labelled as “non-coding” or as “pseudogenes” did not contain functional ORFs. Yet, new ORFs encoding small proteins were recently discovered in RNAs (or parts of RNA) that had previously been annotated as non-coding. Working out what these small proteins actually do will require scientists being able to find more of these overlooked ORFs. The RNAs produced by many organisms – from humans and mice to fruit flies and yeast – have been catalogued and the data stored in publicly accessible databases. Samandi, Roy et al. have now taken a fresh look at the data for nine different organisms, and identified several thousand examples of possibly overlooked ORFs, which they refer to as “alternative ORFs”. This included more than 180,000 from humans. Further analysis of other datasets that captured details of the proteins actually produced in human cells uncovered thousands of small proteins encoded by the predicted alternative ORFs. Many of the so-called alternative proteins also resembled parts of other proteins that have a known activity or function. Lastly, Samandi, Roy et al. focused on two alternative proteins and showed that they both might affect the activity of the proteins coded within the main ORF in their respective genes. These findings reveal new details about the different proteins encoded within the genes of humans and other organisms, including that many mRNAs encode more that one protein. The implications and applications of this research could be far-reaching, and may help scientists to better understand how genes work in both health and disease.
Databáze: OpenAIRE