Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors

Autor:	Helen Priisalu, Oleg Okun
Rok vydání:	2009
Předmět:	Cancer classification Computer science business.industry Gene Expression Medicine (miscellaneous) Pattern recognition computer.software_genre Dependence relation k-nearest neighbors algorithm Copula (probability theory) Ensembles of classifiers Bivariate data Artificial Intelligence Neoplasms Humans Artificial intelligence Data mining business Classifier (UML) computer Test data
Zdroj:	Artificial Intelligence in Medicine. 45:151-162
ISSN:	0933-3657
DOI:	10.1016/j.artmed.2008.08.004
Popis:	Objective: We explore the link between dataset complexity, determining how difficult a dataset is for classification, and classification performance defined by low-variance and low-biased bolstered resubstitution error made by k-nearest neighbor classifiers. Methods and material: Gene expression based cancer classification is used as the task in this study. Six gene expression datasets containing different types of cancer constitute test data. Results: Through extensive simulation coupled with the copula method for analysis of association in bivariate data, we show that dataset complexity and bolstered resubstitution error are associated in terms of dependence. As a result, we propose a new scheme for generating ensembles of classifiers that selects subsets of features of low complexity for ensemble members, which constitutes the accurate members according to the found dependence relation. Conclusion: Experiments with six gene expression datasets demonstrate that our ensemble generating scheme based on the dependence of dataset complexity and classification error is superior to a single best classifier in the ensemble and to the traditional ensemble construction scheme that is ignorant of dataset complexity.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d5f03988eb624b844f4568fd0af78a74 https://doi.org/10.1016/j.artmed.2008.08.004 Zobrazit plný text záznamu Full Text from ScienceDirect