Popis: |
The classification of peptides, drawn from protein sequences, is a common problem in bioinformatics research. Examples of peptide classification problems include the identification of viral protein binding sites, which may be suitable· targets for drugs, and the identification of signal peptides, which are important in regulating processes within cells. Many computational approaches have been applied to peptide classification ranging from relatively simple statistical analysis, through template based pattern matching, to state-of-the art machine learning techniques. In recent years artificial neural networks and support vector machines have been widely applied to this class of problems with significant success. The bio-basis function neural network (BBFNN) is a further method designed specifically for the classification of peptides and protein sequences. It makes use of amino acid .- scoring matrices and support peptides to implement a homology based neural network. The method accepts sequence data directly as input, without the need to encode peptides numerically. This thesis examines the limitations of the bio-basis neural network and attempts to address them. An evolved matrix bio-kernel network is introduced which can create problem specific amino acid scoring matrices. This removes the need to select an appropriate matrix from the standard matrices available, which are generally optimised for sequence database searches rather than peptide classification. A sparse Bayesian bio-kernel network is also introduced, to address the problem of selecting support peptides for a bio-basis function network. Sparse Bayesian learning is used to automatically create parsimonious models. The methods are applied to five previously published peptide datasets. ·Problem specific evolved scoring matrices are shown to increase classification accuracies, on larger datasets. Sparse Bayesian networks offer improved classification accuracy over the standard BBFNN, achieving these improvements whilst also reducing model size. The addition of position specific weights to the bio-kernel results in further reductions in model size. Trained models are examined, and patterns identified which match previous laboratory and computational findings. |