High-Throughput Gene Discovery in the Rat

Autor: Sara Holte, Justin Coco, Clayton L. Birkett, Kelly Schaefer, Christina H. Smith, Susan A. Baumes, Robert H. Brown, Nishank Trivedi, Chris Estes, Brian O'Leary, Chad A. Roberts, Jim Conklin, Micca Donohue, Rebecca S. Reiter, M. Bento Soares, A. Jason Grundstad, Keith Crouch, Chris Moressi, Mark Lebeck, Tom C. Freeman, Kevin Pedretti, Jennifer J.S. Laffin, Mindee Perdue, Jack M. Gardiner, Ling Qui, Thomas L. Casavant, Rikki Kreger, Shereen Chang, Michael F. Smith, Allen J. Gavin, Brad Johnson, Catherine Keppel, Natalie L. Robinson, Kurtis Trout, Katrina Fishler, Mari E. Eyestone, Dylan Tack, Brian Berger, Rudy Marcelino, Maria de Fatima Bonaldo, Val C. Sheffield, Bridgette Rhoads, Todd E. Scheetz, Tamara A. Kucaba, Greg Doonan, Jim J.-C. Lin, Vladan Miljkovich, Jared M. Bischof, Ivana Sunjevaric, Joshua Rehmann, Barry Gackle, Lankai Guo, Ning Wu, Brian Thomas Mokrzycki
Jazyk: angličtina
Rok vydání: 2004
Předmět:
Zdroj: Scheetz, T E, Laffin, J J, Berger, B, Holte, S, Baumes, S A, Brown, R, Chang, S, Coco, J, Conklin, J, Crouch, K, Donohue, M, Doonan, G, Estes, C, Eyestone, M, Fishler, K, Gardiner, J, Guo, L, Johnson, B, Keppel, C, Kreger, R, Lebeck, M, Marcelino, R, Miljkovich, V, Perdue, M, Qui, L, Rehmann, J, Reiter, R S, Rhoads, B, Schaefer, K, Smith, C, Sunjevaric, I, Trout, K, Wu, N, Birkett, C L, Bischof, J, Gackle, B, Gavin, A, Grundstad, A J, Mokrzycki, B, Moressi, C, O'Leary, B, Pedretti, K, Roberts, C, Robinson, N L, Smith, M, Tack, D, Trivedi, N, Kucaba, T, Freeman, T, Lin, J J-C, Bonaldo, M F, Casavant, T L, Sheffield, V C & Soares, M B 2004, ' High-throughput gene discovery in the rat ', Genome Research, vol. 14, no. 4, pp. 733-41 . https://doi.org/10.1101/gr.1414204
Popis: Genomic resources have proven to be very useful in both human and mouse genetic studies. For example, the human UniGene set (Schuler 1997) and GeneMap `99 (Deloukas et al. 1998) at National Center for Biotechonology Information (NCBI) were invaluable in the recent identification of two Bardet-Biedl syndrome genes (Nishimura et al. 2001; Mykytyn et al. 2002). The rat also provides several excellent established physiological and biochemical models for the study of genetically complex human diseases, including hypertension (Hilbert et al. 1991; Jacob et al. 1991), renal disease (Brown et al. 1996), behavioral disorders (Moisan et al. 1996), and auto-immune disorders (Jacob et al. 1992). To make efficient use of the rat as a model for human disease and physiology, a first step is to identify a comprehensive set of genes. This process of gene identification is dubbed gene discovery. Large-scale production of expressed sequence tags (ESTs) from arrayed cDNA clones has proven to be the most efficient and cost-effective strategy for gene discovery. EST-based gene discovery strategies are advantageous for a few reasons, not the least of which is that purely computational methods of gene prediction are notoriously inaccurate in higher eukaryotes, in which only a small fraction of the genome codes for transcribed genes (Guigo et al. 2000). However, it is the coupling of both EST and genomic sequence data that is most desirable as it allows for determination of the structure of each gene. It is also noteworthy that ESTs are invaluable for genome sequence annotation and for identification of orthologous relationships between sequences of different, and often evolutionarily distant, organisms. The latter is of essence in using rat models for the study of human diseases. Typically, cDNA libraries used for production of ESTs are oligo-dT-primed and directionally cloned. Thus, the sequence obtained from the 3′ end of a cDNA clone, that is, 3′ EST, corresponds to the 3′ end of the mRNA. Depending upon the length of both the 3′ EST and that of the 3′ untranslated sequence of the mRNA, a 3′ EST may contain none or very limited coding sequence information. Because 3′ untranslated sequences are less conserved than are coding regions as a general rule (Makalowski and Boguski 1998) and are relatively long (750-bp average length; Pesole et al. 1999), a 3′ EST can be used as a fingerprint to unequivocally identify a transcript. For this reason, each unique 3′ EST that contains a bona fide polyadenylation signal sequence and tail can be tentatively considered as representing a different mRNA and, except for the cases of alternative splicing and/or differential polyadenylation, a different transcription unit. Conversely, 5′ ESTs are derived from the 5′ ends of the cDNAs and may encompass 5′ untranslated, coding, and/or 3′ untranslated sequences depending upon the length of the EST and whether the cDNA corresponds to a full-length or a truncated copy of the mRNA. Typically, however, 5′ ESTs span coding sequences and thus often enable identification of similarities between evolutionarily conserved sequences from different organisms. The latter is invaluable in identifying orthologous relationships and candidate functions to otherwise unknown transcripts. It is noteworthy that ESTs are single-pass sequences, and as such have an error rate of ∼3% (Hillier et al. 1996). This poses certain challenges to computational methods for identification of nonredundant sets of ESTs, such as NCBI's UniGene collections. The EST approach to gene discovery has been successfully applied to a number of organisms (Adams et al. 1995; Hillier et al. 1996; Marra et al. 1999; Dimopoulos et al. 2000; Blackshear et al. 2001; Whitfield et al. 2002). It should be acknowledged, however, that despite its advantages, there are certain limitations to this approach, not the least of which is the redundant generation of ESTs derived from the most common transcripts, that is, mitochondrial RNAs, ribosomal RNAs, and mRNAs of the super-prevalent and intermediate frequency classes (Bishop et al. 1974). This is a problem that can significantly impair the overall efficiency of a gene discovery program that relies solely on the generation of ESTs from cDNA clones randomly picked from standard (non-normalized) libraries. Accordingly, the use of normalized cDNA libraries in which all clones are represented at a comparable frequency (Soares et al. 1994; Bonaldo et al. 1996) has proven most advantageous (Hillier et al. 1996; Marra et al. 1999; Dimopoulos et al. 2000; Blackshear et al. 2001; Whitfield et al. 2002). It is noteworthy, however, that the process of normalization only contributes to minimize redundancies within libraries, and it is particularly advantageous to minimize redundant identification of tissue-specific mRNAs. Redundant production of ESTs derived from ubiquitously expressed mRNAs constitutes a major problem at intermediate to advanced phases of gene discovery programs. Hence, we have argued that this problem can be more effectively addressed by the use of subtractive libraries that are progressively enriched for novel ESTs (Bonaldo et al. 1996; Soares 1997). This is the rationale behind our strategy to generate ESTs from serially subtracted normalized libraries. Serial subtraction of normalized libraries is an iterative process whereby arrayed sets of cDNAs, from which ESTs have been derived, are pooled and used as a driver in a subtractive hybridization with one or a pool of normalized or subtracted libraries. It is noteworthy that our cDNA clones contain library-specific sequence tags to enable computational identification of library and tissue of origin of ESTs obtained from pooled libraries (Gavin et al. 2002). Because the representation of the driver population is significantly reduced in the resulting subtracted library, redundant generation of ESTs is greatly minimized. Hence, every new library of a series is enriched for novel and progressively rarer ESTs. Here we describe the use of this strategy to identify a comprehensive nonredundant collection of rat ESTs with unprecedented efficiency.
Databáze: OpenAIRE