Mouse BAC Ends Quality Assessment and Sequence Analyses

Autor: Margaret Krol, Elizabeth Gebregeorgis, Daniel A. Russell, Joel A. Malek, Bola Ayodeji, Keita Geer, Alla Shvartsbeyn, Shaying Zhao, Larry Overton, George Dimitrov, Getahun Tsegaye, Jyoti Shetty, Lingxia Jiang, Tamara Feldblyum, William C. Nierman, Kevin Tran, Claire M. Fraser, Sofiya Shatsman
Jazyk: angličtina
Rok vydání: 2001
Předmět:
Popis: Because of the high stability (Shizuya et al. 1992; Kim et al. 1996a,b), libraries constructed in bacterial artificial chromosome (BAC) vectors have become the standard clone sets in high-throughput genomic sequencing projects of organisms with large genomes. End sequences from BACs provide highly specific markers. A genome sequencing approach (Venter et al. 1996) has been described, in which a clone contig is extended by selecting the minimally overlapping clones in each direction by searching the finished BAC sequence against a BAC end sequence (BES) database. Because BACs (an average insert size of 150 kb) are sufficiently large to traverse most tandem arrays of homology units and repeats, BESs are useful in genome assembly and chromosome walking and have been used extensively to confirm, join, and order existing contigs (International Human Genome Sequencing Consortium 2001a). The whole-genome shotgun sequencing strategy relies on BESs as the primary scaffold onto which the end sequences from the smaller clones are assembled (Venter et al. 1998, 2001). The mouse and the human share many fundamental biological processes. Consequently, the mouse has been used frequently in medical research and is the best model system for studying human disease. Additionally, the mouse genome sequence facilitates the accurate annotation of the human genome. As such, National Institutes of Health (NIH) launched a mouse genome-sequencing project in October, 1999 (http://www.nhgri.nih.gov/NEWS/MouseRelease.htm). Compared with the human, significantly fewer large-scale mapping efforts have been conducted for the mouse and much less data are available to the community (Hudson et al. 1995; Dietrich et al. 1996; Schuler et al. 1996; McCarthy et al. 1997; Stewart et al. 1997; Deloukas et al. 1998; Van Etten et al. 1999; International Human Genome Mapping Consortium 2001a; Olivier et al. 2001). A large-scale BAC end-sequencing project generates an extensive set of random markers across the genome in an inexpensive and rapid fashion, and will be crucial to the success of the combined strategy of BAC-based sequencing and a moderate level of whole-genome shotgun sequencing that is being used for the mouse genome. The Institute for Genomic Research (TIGR) is the only center conducting large-scale BAC end-sequencing for the mouse, in which the aim of the project is to generate accurate BES pairs from 170,000 RPCI-23 clones (Osoegawa et al. 2000) and 130,000 RPCI-24 clones to support the mouse genome sequencing project. The same set of clones has been fingerprinted at the Genome Sequencing Centre of British Columbia Cancer Research Centre at Vancouver Canada (http://www.bcgsc.bc.ca/projects/mouse_mapping/). We have approached the goal of the project and have generated ∼450,000 sequences (http://www.tigr.org/tdb/bac_ends/mouse/bac_end_intro.html). To provide a better characterization of this valuable resource, we conducted comprehensive quality assessment and sequence analyses as described below.
Databáze: OpenAIRE