Popis: |
The aim of this work was to predict protein secondary structure using neural network models. Initially a Hopfield network was used but abandoned in favour of a layered network trained using the back propagation algorithm. In the early stages of this work an exploration of the many different approaches to this problem was undertaken. These included attempts to predict boundaries between secondary structures, the secondary structures of individual residues, and the secondary structures of sequences wholly within a particular secondary structure. Results indicated the latter to be the best approach to continue with. In addition two coding schemes were investigated: a coding scheme based on occurrences of pairs of residues and one based on the positions of residues. It was found that this positional coding scheme was the natural coding scheme for this problem. On segments of whole alpha-helix and whole non-alpha-helix 10 residues in length a prediction success of around 80% with a correlation coefficient of 0.52 was achieved with the positional coding scheme. On whole proteins, where predictions are made for individual residues, alpha-helix prediction drops to 73% with a correlation coefficient of 0.34. The relative predictability of alpha-helices of above and below average accessibility was also investigated showing that those of above average accessibility are more predictable than those with below average accessibility. The main body of this work concerns the apparent limit of predictability of alpha-helices. It was found that test set prediction did not depend on the number of hidden nodes. In fact, a single layer network performed as well as those with hidden nodes showing that the probolem is basically linearly separable. In addition, prediction success plateaus well below that of perfect prediction success. During training, test set prediction is seen to peak. The decrease in prediction success was found to be due to non-alpha-helix sequences that the network was unable to distinguish from real alpha-helix sequences. These regions of non-alpha-helix were shown to occur adjacent to actual alpha-helices with statistical significance. It is proposed that potential alpha-helices are disrupted by global constraints during the formation of tertiary structure. The effect of window size was also investigated as was beta-sheet prediction, but this was found to be limited by the small number of examples available with our approach. However, its distribution in the input space in relation to alpha-helix and coil was determined. |