Fast Iterative Gene Clustering Based on Information Theoretic Criteria for Selecting the Cluster Structure
Autor: | Ioan Tabus, Jaakko Astola, Mauno Vihinen, Ciprian Doru Giurcaneanu, Juha Ollila |
---|---|
Rok vydání: | 2004 |
Předmět: |
Clustering high-dimensional data
Fuzzy clustering Single-linkage clustering Correlation clustering Information Theory computer.software_genre Databases Genetic Genetics Cluster Analysis Humans Cluster analysis Minimum description length Molecular Biology Mathematics B-Lymphocytes Models Genetic business.industry Gene Expression Profiling Computational Biology Cell Differentiation Pattern recognition Probability Theory Determining the number of clusters in a data set Computational Mathematics Computational Theory and Mathematics Multigene Family Modeling and Simulation Affinity propagation Data mining Artificial intelligence business computer Algorithms |
Zdroj: | Tampere University |
ISSN: | 1557-8666 1066-5277 |
Popis: | Grouping of genes into clusters according to their expression levels is important for deriving biological information, e.g., on gene functions based on microarray and other related analyses. The paper introduces the selection of the number of clusters based on the minimum description length (MDL) principle for the selection of the number of clusters in gene expression data. The main feature of the new method is the ability to evaluate in a fast way the number of clusters according to the sound MDL principle, without exhaustive evaluations over all possible partitions of the gene set. The estimation method can be used in conjunction with various clustering algorithms. A recent clustering algorithm using principal component analysis, the "gene shaving" (GS) procedure, can be modified to make use of the new MDL estimation method, replacing the Gap statistics originally used in GS algorithm. The resulting clustering algorithm is shown to perform better than GS-Gap and CEM (classification expectation maximization), in the simulations using artificial data. The proposed method is applied to B-cell differentiation data, and the resulting clusters are compared with those found by self-organizing maps (SOM). |
Databáze: | OpenAIRE |
Externí odkaz: |