On the Distributed Complexity of Large-Scale Graph Computations

Autor: Peter Robinson, Michele Scquizzato, Gopal Pandurangan
Rok vydání: 2021
Předmět:
PageRank
FOS: Computer and information sciences
Logarithm
Computer science
Computation
Triangle enumeration
Scale (descriptive set theory)
0102 computer and information sciences
02 engineering and technology
Clique (graph theory)
01 natural sciences
Upper and lower bounds
Software
Theoretical Computer Science
Hardware and Architecture
law.invention
law
020204 information systems
Computer Science - Data Structures and Algorithms
Enumeration
0202 electrical engineering
electronic engineering
information engineering

Partition (number theory)
Data Structures and Algorithms (cs.DS)
Communication complexity
Mathematics
Discrete mathematics
Distributed graph algorithms
Lower bounds
Partition (database)
Computer Science Applications
Computer Science - Distributed
Parallel
and Cluster Computing

Computational Theory and Mathematics
010201 computation theory & mathematics
Distributed algorithm
Modeling and Simulation
Distributed
Parallel
and Cluster Computing (cs.DC)
Zdroj: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures
SPAA
Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures-SPAA 18
Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures -SPAA '18
ISSN: 2329-4957
2329-4949
DOI: 10.1145/3460900
Popis: Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n >> k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem , a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. This result is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a “cookbook” fashion to show distributed lower bounds for several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration . These applications show that our approach can yield lower bounds for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k , improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: PageRank: We show a lower bound of Ὼ(n/k 2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in Õ(n/k 2 ) rounds. Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires Ὼ(m/k 5/3 ) rounds. This result also implies the first non-trivial lower bound of Ὼ(n 1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in Õ(m/k 5/3 + n/k 4/3 ) rounds.
Databáze: OpenAIRE