Popis: |
In this thesis we develop a package of generic methods for boosting the velocity of graph computations, regarding partitioning and incrementalization. The former is to deal with the challenges of volume and velocity over big data, and make computations scalable. The latter is to speed up computations for dynamic graph analytics. On the one hand, existing partitioners handle graphs in an “one-size-fits-all” way and do not consider the applications running on the top. This results in sub-optimal performances for distributed computations. On the other hand, however, current incremental graph algorithms are ad hoc designed, which require a lot of efforts even for domain experts. Worse still, most of these algorithms lack provable guarantees for their efficiency. To this end, this thesis presents a series of generic yet effective approaches for boosting the efficiency and scalability of graph computation. Firstly, for graph partitioning, we propose an application-driven hybrid partitioning strategy that takes the top-level application into consideration. Given a graph algorithm A, the strategy learns a cost model for A as polynomial regression. We develop partitioners that given the learned cost model, refine an edge-cut or vertex-cut partition to a hybrid partition and reduce the parallel cost of A. Moreover, we extend the partitioners to handle multiple cost models of mixed workloads in one batch. Secondly, we study generic guidelines for incrementalizing graph algorithms. We identify a class of incrementalizable algorithms abstracted in a fixpoint model. We show how to deduce an incremental algorithm A∆ from such an algorithm A. Moreover, A∆ can be made bounded relative to A, i.e., its cost is determined by the size of changes to graphs and changes to the affected area that is necessarily checked by batch algorithm A. We provide generic conditions under which a deduced algorithm A∆ warrants to be correct and relatively bounded, by adopting the same logic and data structures of A, at most using timestamps as an additional auxiliary structure. Finally, we go beyond the incrementalization of generic graph algorithms, and focus on graph partitioners where the algorithms are heuristic and exact results are not required. We propose to incrementalize widely-used graph partitioners A into heuristically-bounded incremental algorithms A∆. Given graph G, updates ∆G to G and a partition A(G) of G by A, A∆ computes changes ∆O to A(G) such that (1) applying ∆O to A(G) produces a new partition of the updated graph although it may not be exactly the one derived by A, (2) it retains the same bounds on balance and cut sizes as A, and (3) ∆O is decided by ∆G alone. We show that we can deduce A∆ from both vertex-cut and edge-cut partitioners A, retaining their bounds |