Ensembles of Nearest Neighbors for Gene Expression Based Cancer Classification

Autor: Oleg Okun, Helen Priisalu
Rok vydání: 2008
Předmět:
Zdroj: Supervised and Unsupervised Ensemble Methods and their Applications ISBN: 9783540789802
DOI: 10.1007/978-3-540-78981-9_6
Popis: Gene expression levels are useful in discriminating between cancer and normal examples and/or between different types of cancer. In this chapter, ensembles of k-nearest neighbors are employed for gene expression based cancer classification. The ensembles are created by randomly sampling subsets of genes, assigning each subset to a k-nearest neighbor (k-NN) to perform classification, and finally, combining k-NN predictions with majority vote. Selection of subsets is governed by the statistical dependence between dataset complexity and classification error, confirmed by the copula method, so that least complex subsets are preferred since they are associated with more accurate predictions. Experiments carried out on six gene expression datasets show that our ensemble scheme is superior to a single best classifier in the ensemble and to the redundancy-based filter, especially designed to remove irrelevant genes.
Databáze: OpenAIRE