Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis

Autor:	Yang Mary, Bu Hua-Long, Li Guo-Zheng, Zeng Xue-Qiang, Yang Jack Y
Jazyk:	angličtina
Rok vydání:	2008
Předmět:	Biotechnology TP248.13-248.65 Genetics QH426-470
Zdroj:	BMC Genomics, Vol 9, Iss Suppl 2, p S24 (2008)
Druh dokumentu:	article
ISSN:	1471-2164
DOI:	10.1186/1471-2164-9-S2-S24
Popis:	Abstract Background Dimension reduction is a critical issue in the analysis of microarray data, because the high dimensionality of gene expression microarray data set hurts generalization performance of classifiers. It consists of two types of methods, i.e. feature selection and feature extraction. Principle component analysis (PCA) and partial least squares (PLS) are two frequently used feature extraction methods, and in the previous works, the top several components of PCA or PLS are selected for modeling according to the descending order of eigenvalues. While in this paper, we prove that not all the top features are useful, but features should be selected from all the components by feature selection methods. Results We demonstrate a framework for selecting feature subsets from all the newly extracted components, leading to reduced classification error rates on the gene expression microarray data. Here we have considered both an unsupervised method PCA and a supervised method PLS for extracting new components, genetic algorithms for feature selection, and support vector machines and k nearest neighbor for classification. Experimental results illustrate that our proposed framework is effective to select feature subsets and to reduce classification error rates. Conclusion Not only the top features newly extracted by PCA or PLS are important, therefore, feature selection should be performed to select subsets from new features to improve generalization performance of classifiers.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/fbbd6dc3617548a6bdd0f5ec1cbdf5a4 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF