Clustering Large Probabilistic Graphs

Autor: George Kollios, Michalis Potamias, Evimaria Terzi
Rok vydání: 2013
Předmět:
Zdroj: IEEE Transactions on Knowledge and Data Engineering. 25:325-336
ISSN: 1041-4347
DOI: 10.1109/tkde.2011.243
Popis: We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.
Databáze: OpenAIRE