A Microbenchmark Characterization of the Emu Chick
Autor: | Eric R. Hein, Thomas M. Conte, Patrick Lavin, Srinivas Eswar, Jason Riedy, Jeffrey Young, Richard Vuduc, Jiajia Li |
---|---|
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Computer Networks and Communications Computer science PowerPC 010103 numerical & computational mathematics 02 engineering and technology Thread (computing) Parallel computing 01 natural sciences Theoretical Computer Science Artificial Intelligence Hardware Architecture (cs.AR) 0202 electrical engineering electronic engineering information engineering 0101 mathematics Field-programmable gate array Computer Science - Hardware Architecture Xeon Locality Memory bandwidth Computer Graphics and Computer-Aided Design 020202 computer hardware & architecture Computer Science - Distributed Parallel and Cluster Computing Hardware and Architecture Cache Distributed Parallel and Cluster Computing (cs.DC) Software |
DOI: | 10.48550/arxiv.1809.07696 |
Popis: | The Emu Chick is a prototype system designed around the concept of migratory memory-side processing. Rather than transferring large amounts of data across power-hungry, high-latency interconnects, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each memory read. The current prototype hardware uses FPGAs to implement cache-less “Gossamer” cores for computational work and rely on a typical stationary core (PowerPC) to run basic operating system functions and migrate threads between nodes. In this multi-node characterization of the Emu Chick, we extend an earlier single-node investigation [1] of the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication. We compare the Emu Chick hardware to architectural simulation and an Intel Xeon-based platform. Our results demonstrate that for many basic operations the Emu Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture although bandwidth usage suffers for computationally intensive workloads like SpMV. Moreover, the Emu Chick provides stable, predictable performance with up to 65% of the peak bandwidth utilization on a random-access pointer chasing benchmark with weak locality. |
Databáze: | OpenAIRE |
Externí odkaz: |