Some statistical properties of gene expression clustering for array data

Autor: Abreu, G C G, Pinheiro, A, Drummond, R D, Camargo, S R, Menossi, M
Jazyk: angličtina
Rok vydání: 2010
Předmět:
Zdroj: Abreu, G C G, Pinheiro, A, Drummond, R D, Camargo, S R & Menossi, M 2010, ' Some statistical properties of gene expression clustering for array data ', Advances and Applications in Statistics, vol. 14, no. 2, pp. 191-204 .
Popis: DNA arrays have been a rich source of data for the study of genomic expression of a wide variety of biological systems. Gene clustering is one of the paradigms quite used to assess the significance of a gene (or group of genes). However, most of the gene clustering techniques are applied to cDNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https://ipe.cbmeg.unicamp.br/pub/abreu.gcg. Code implementation in R is in progress. Udgivelsesdato: February DNA arrays have been a rich source of data for the study of genomic expression of a wide variety of biological systems. Gene clustering is one of the paradigms quite used to assess the significance of a gene (or group of genes). However, most of the gene clustering techniques are applied to cDNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https://ipe.cbmeg.unicamp.br/pub/abreu.gcg. Code implementation in R is in progress.
Databáze: OpenAIRE