GraphQ
Autor: | Chao Wang, Yanzhi Wang, Mingxing Zhang, Niu Dimin, Rui Wang, Youwei Zhuo, Xuehai Qian |
---|---|
Rok vydání: | 2019 |
Předmět: |
010302 applied physics
Speedup Computer science Computation 02 engineering and technology Parallel computing 01 natural sciences Graph 020202 computer hardware & architecture Vertex (geometry) Asynchronous communication Bounded function 0103 physical sciences Scalability 0202 electrical engineering electronic engineering information engineering Tesseract Execution model Conventional memory |
Zdroj: | MICRO |
DOI: | 10.1145/3352460.3358256 |
Popis: | Processing-In-Memory (PIM) architectures based on recent technology advances (e.g., Hybrid Memory Cube) demonstrate great potential for graph processing. However, existing solutions did not address the key challenge of graph processing---irregular data movements. This paper proposes GraphQ, an improved PIM-based graph processing architecture over recent architecture Tesseract, that fundamentally eliminates irregular data movements. GraphQ is inspired by ideas from distributed graph processing and irregular applications to enable static and structured communication with runtime and architecture co-design. Specifically, GraphQ realizes: 1) batched and overlapped inter-cube communication by reordering vertex processing order; 2) streamlined inter-cube communication by using heterogeneous cores for different access types. Moreover, to tackle the discrepancy between inter-cube and inter-node bandwidth, we propose a hybrid execution model that performs additional local computation during the inter-node communication. This model is general enough and applicable to asynchronous iterative algorithms that can tolerate bounded stale values. Putting all together, GraphQ simultaneously maximizes intra-cube, inter-cube, and inter-node communication throughput. In a zSim-based simulator with five real-world graphs and four algorithms, GraphQ achieves on average 3.3× and maximum 13.9× speedup, 81% energy saving compared with Tesseract. We show that increasing memory size in PIM also proportionally increases compute capability: a 4-node GraphQ achieves 98.34× speedup compared with a single node with the same memory size and conventional memory hierarchy. |
Databáze: | OpenAIRE |
Externí odkaz: |