Popis: |
The Human Genome Project is entering the large scale sequencing phase. During the next few years, millions of bases will be sequenced daily in the genome centers worldwide, and, in order to analyze them, methods to reliably predict the genes encoded in genomic sequences are becoming essential. As the databases of known coding sequences increase in size, gene prediction methods based on sequence similarity to coding sequences-mainly, proteins and ESTs—are becoming increasingly useful, and they are routinely used to identify putative genes in anonymous genomic sequences (see, for instance, The C. Elegans Sequencing Consortium, 1998). There is little systematic knowledge, however, on the accuracy of sequence similarity based gene predictions, in particular of the ability of these methods to correctly infer the exonic structure of the genes in higher eukariotic organisms. In this chapter, we will address this shortcoming, by evaluating the accuracy of gene predictions derived exclusively from sequence similarity database searches. In practice, we will use two programs from the popular BLAST suite (Altschul et al., 1990; Altschul and Gish, 1996): BLASTX (Gish and States, 1993), using a |