A MULTI-STRATEGY APPROACH TO INFORMATIVE GENE IDENTIFICATION FROM GENE EXPRESSION DATA
Autor: | Anne E.G. Lenferink, Catherine Collins, Ziying Liu, Youlian Pan, Fazel Famili, Maureen D. O'Connor-McCourt, Sieu Phan, Christiane Cantin |
---|---|
Rok vydání: | 2010 |
Předmět: |
Multi-strategy learning
Decision tree Genomics Gene expression data analysis Biology computer.software_genre Biochemistry Field (computer science) Mice Artificial Intelligence Transforming Growth Factor beta Databases Genetic Animals Humans Molecular Biology Gene Selection (genetic algorithm) Oligonucleotide Array Sequence Analysis Gene Expression Profiling Decision Trees Computational Biology Computer Science Applications Gene expression profiling Leukemia Myeloid Acute Identification (information) Cell Transformation Neoplastic ComputingMethodologies_PATTERNRECOGNITION Data mining and knowledge discovery Data analysis Data mining computer |
Zdroj: | Journal of Bioinformatics and Computational Biology. :19-38 |
ISSN: | 1757-6334 0219-7200 |
DOI: | 10.1142/s0219720010004495 |
Popis: | An unsupervised multi-strategy approach has been developed to identify informative genes from high throughput genomic data. Several statistical methods have been used in the field to identify differentially expressed genes. Since different methods generate different lists of genes, it is very challenging to determine the most reliable gene list and the appropriate method. This paper presents a multi-strategy method, in which a combination of several data analysis techniques are applied to a given dataset and a confidence measure is established to select genes from the gene lists generated by these techniques to form the core of our final selection. The remainder of the genes that form the peripheral region are subject to exclusion or inclusion into the final selection. This paper demonstrates this methodology through its application to an in-house cancer genomics dataset and a public dataset. The results indicate that our method provides more reliable list of genes, which are validated using biological knowledge, biological experiments, and literature search. We further evaluated our multi-strategy method by consolidating two pairs of independent datasets, each pair is for the same disease, but generated by different labs using different platforms. The results showed that our method has produced far better results. |
Databáze: | OpenAIRE |
Externí odkaz: |