Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs
Autor: | Joseph Zuckerman, Davide Giri, Luca P. Carloni, Jihye Kwon, Paolo Mantovani |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
010302 applied physics
FOS: Computer and information sciences Speedup Hardware_MEMORYSTRUCTURES Memory hierarchy Computer science business.industry Q-learning 02 engineering and technology 01 natural sciences 020202 computer hardware & architecture Embedded system 0103 physical sciences Hardware Architecture (cs.AR) 0202 electrical engineering electronic engineering information engineering Reinforcement learning Overhead (computing) System on a chip Cache business Computer Science - Hardware Architecture Cache coherence |
Zdroj: | MICRO |
Popis: | One of the most critical aspects of integrating loosely-coupled accelerators in heterogeneous SoC architectures is orchestrating their interactions with the memory hierarchy, especially in terms of navigating the various cache-coherence options: from accelerators accessing off-chip memory directly, bypassing the cache hierarchy, to accelerators having their own private cache. By running real-size applications on FPGA-based prototypes of many-accelerator multi-core SoCs, we show that the best cache-coherence mode for a given accelerator varies at runtime, depending on the accelerator's characteristics, the workload size, and the overall SoC status. Cohmeleon applies reinforcement learning to select the best coherence mode for each accelerator dynamically at runtime, as opposed to statically at design time. It makes these selections adaptively, by continuously observing the system and measuring its performance. Cohmeleon is accelerator-agnostic, architecture-independent, and it requires minimal hardware support. Cohmeleon is also transparent to application programmers and has a negligible software overhead. FPGA-based experiments show that our runtime approach offers, on average, a 38% speedup with a 66% reduction of off-chip memory accesses compared to state-of-the-art design-time approaches. Moreover, it can match runtime solutions that are manually tuned for the target architecture. To appear in the 54th IEEE/ACM Symposium on Microarchitecture (MICRO 2021) |
Databáze: | OpenAIRE |
Externí odkaz: |