An empirical comparison of Big Graph frameworks in the context of network analysis

Autor: Jannis Koch, Henning Meyerhenke, Christian L. Staudt, Maximilian Vogel
Rok vydání: 2016
Předmět:
Zdroj: Social Network Analysis and Mining. 6
ISSN: 1869-5469
1869-5450
DOI: 10.1007/s13278-016-0394-1
Popis: Complex networks are heterogeneous relational data sets with nontrivial substructures and statistical properties. They are typically represented as graphs consisting of vertices and edges. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frameworks-GraphLab, Apache Giraph, Giraph++ and Apache Flink—are used to implement algorithms for the representative problems connected components, community detection, PageRank and clustering coefficients. The implementations are executed on a computer cluster to evaluate the frameworks’ suitability in practice and to compare their performance to that of the single-machine, shared-memory parallel network analysis package NetworKit. Out of the distributed frameworks, GraphLab and Apache Giraph generally show the best performance. In our experiments a cluster of eight computers running Apache Giraph enables the analysis of a network with ca. 2 billion edges, which is too large for a single machine of the same type. However, for networks that fit into memory of one machine, the performance of the shared-memory parallel implementation is usually far better than the distributed ones. The study provides experimental evidence for selecting the appropriate framework depending on the task and data volume.
Databáze: OpenAIRE