PATTERNS OF DIPEPTIDE USAGE FOR GENE PREDICTION

Autor: Gangadharaiah, Dayananda Sagar
Jazyk: angličtina
Rok vydání: 2010
Předmět:
Druh dokumentu: Text
Popis: As the number of complete genomes that have been sequenced continues to grow rapidly, the identification of genes regions in DNA sequence data remains one of the most important open problems in bio-informatics. Improving the accuracy of such gene finding tools by a small percentage would affect accurate predictions of many genes of an organism (Zhu et al., 2010). This thesis presents a novel approach for identifying coding regions of a genome based on dipeptide usage.The patterns in dipeptide usage are used to discriminate between coding and non-coding DNA regions. Two sample T-tests are used as tests of significance to determine the dipeptides that show significant difference in their occurrences in coding and non-coding regions. These methods are primarily tested on Escherichia coli -536 genome, where they reached an accuracy of 96.5% in identifying coding region and 100% accuracy in identifying non-coding regions. The trained classifier data Escherichia coli-536's genome is utilized to predict the coding and non-coding regions of Salmonella enterica subsp. enterica serovar Typhi's genome. The results of these experiments showed an accuracy of 79.5% in predicting coding regions and 100% in predicting non-coding regions of Salmonella enterica subsp. enterica serovar Typhi's genome.
Databáze: Networked Digital Library of Theses & Dissertations