Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

Autor: Hilary A. Coller, Olga G. Troyanskaya, Curtis Huttenhower, Nathan O. Siemers, Chad L. Myers, Jessica N. Landis, Sauhard Sahi, Matthew A. Hibbs, Kellen L. Olszewski, Avi I. Flamholz
Rok vydání: 2007
Předmět:
Clustering high-dimensional data
Genes
Fungal

Gene Expression
Saccharomyces cerevisiae
02 engineering and technology
Biology
lcsh:Computer applications to medicine. Medical informatics
Machine learning
computer.software_genre
Biochemistry
k-nearest neighbors algorithm
Task (project management)
03 medical and health sciences
Structural Biology
Gene Expression Regulation
Fungal

Databases
Genetic

0202 electrical engineering
electronic engineering
information engineering

Cluster Analysis
Cluster analysis
lcsh:QH301-705.5
Molecular Biology
030304 developmental biology
0303 health sciences
Microarray analysis techniques
business.industry
Gene Expression Profiling
Applied Mathematics
Small number
Computer Science Applications
Gene expression profiling
lcsh:Biology (General)
ROC Curve
lcsh:R858-859.7
020201 artificial intelligence & image processing
Artificial intelligence
Data mining
DNA microarray
business
computer
Algorithms
Software
Zdroj: BMC Bioinformatics
BMC Bioinformatics, Vol 8, Iss 1, p 250 (2007)
ISSN: 1471-2105
Popis: Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). Results We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.
Databáze: OpenAIRE