E. coli Promoter Prediction Using Expectation Maximization Algorithms, Fuzzy Sets and Neural Networks

Autor: Chun-Cheng Peng, 彭俊澄
Rok vydání: 2003
Druh dokumentu: 學位論文 ; thesis
Popis: 91
Escherichia coli K12 was sequenced in 1997. The 4,639,221-base pair DNA sequence consists of 4288 annotated protein-coding genes, 38 percent have no attributed function. For the prediction of prokaryotic promoter, one of the major problems is how to locate the spacers between -35 box and -10 box and between -10 box and transcription start site. In this thesis, locations of promoter regions can be accurately orientated via the adopted expectation maximization (EM) algorithm. And the most representative features are used for training neural networks. On the other hand, most related researches choose a wider range of training sequences directly. But such the workload for both the computation capabilities and demand of memory space are extremely heavy. If our EM extracted features still use traditional orthogonal coding method, the heavy burden of systems cannot be avoidable. Therefore we develop a brand new purine-pyrimidine encoding method. Not only the dimensions of training data can be reduced in large-scale, but also the simulation results of our new coding approach reveal that the precisions of promoter prediction are approximately to the results used traditional orthogonal encoding method.
Databáze: Networked Digital Library of Theses & Dissertations