Popis: |
The improvements in Deoxyribonucleic Acid (DNA) microarray technology mean that thousands of genes can be profiled simultaneously in a quick and efficient manner. DNA microarrays are increasingly being used for prediction and early diagnosis in cancer treatment. Feature selection and classification play a pivotal role in this process. The correct identification of an informative subset of genes may directly lead to putative drug targets. These genes can also be used as an early diagnosis or predictive tool. However, the large number of features (many thousands) present in a typical dataset present a formidable barrier to feature selection efforts. Many approaches have been presented in literature for feature selection in such datasets. Most of them use classical statistical approaches (e.g. correlation). Classical statistical approaches, although fast, are incapable of detecting non-linear interactions between features of interest. By default, Evolutionary Algorithms (EAs) are capable of taking non-linear interactions into account. Therefore, EAs are very promising for feature selection in such datasets. It has been shown that dimensionality reduction increases the efficiency of feature selection in large and noisy datasets such as DNA microarray data. The two-phase Evolutionary Algorithm/k-Nearest Neighbours (EA/k-NN) algorithm is a promising approach that carries out initial dimensionality reduction as well as feature selection and classification. This thesis further investigates the two-phase EA/k-NN algorithm and also introduces an adaptive weights scheme for the k-Nearest Neighbours (k-NN) classifier. It also introduces a novel weighted centroid classification technique and a correlation guided mutation approach. Results show that the weighted centroid approach is capable of out-performing the EA/k-NN algorithm across five large biomedical datasets. It also identifies promising new areas of research that would complement the techniques introduced and investigated. |