Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs

Autor:	Nigel Topham, Saumay Dublish, Vijay Nagarajan
Jazyk:	angličtina
Rok vydání:	2017
Předmět:	010302 applied physics Hardware_MEMORYSTRUCTURES Memory hierarchy Computer science Cache-only memory architecture Memory bandwidth 02 engineering and technology High Bandwidth Memory Cache pollution 01 natural sciences 020202 computer hardware & architecture Non-uniform memory access Computer architecture 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Bandwidth (computing) Interleaved memory
Zdroj:	Dublish, S, Nagarajan, V & Topham, N 2017, Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs . in 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) . Institute of Electrical and Electronics Engineers (IEEE), pp. 239-248, 2017 IEEE International Symposium on Performance Analysis of Systems and Software, Santa Rosa, United States, 24/04/17 . https://doi.org/10.1109/ISPASS.2017.7975295 ISPASS
DOI:	10.1109/ISPASS.2017.7975295
Popis:	GPUs are often limited by off-chip memory bandwidth. With the advent of general-purpose computing on GPUs, a cache hierarchy has been introduced to filter the bandwidth demand to the off-chip memory. However, the cache hierarchy presents its own bandwidth limitations in sustaining such high levels of memory traffic.In this paper, we characterize the bandwidth bottlenecks present across the memory hierarchy in GPUs for general purpose applications. We quantify the stalls throughout the memory hierarchy and identify the architectural parameters that play a pivotal role in leading to a congested memory system. We explore the architectural design space to mitigate the bandwidth bottlenecks and show that performance improvement achieved by mitigating the bandwidth bottleneck in the cache hierarchy can exceed the speedup obtained by a memory system with a baseline cache hierarchy and High Bandwidth Memory (HBM) DRAM. We also show that addressing the bandwidth bottleneck in isolation at specific levels can be sub-optimal and can even be counter-productive. Therefore, we show that it is imperative to resolve the bandwidth bottlenecks synergistically across different levels of the memory hierarchy. With the insights developed in this paper, we perform a cost-benefit analysis and identify cost effective configurations of the memory hierarchy that effectively mitigate the bandwidth bottlenecks. We show that our final configuration achieves a performance improvement of 29% on average with a minimal area overhead of 1.6%.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a7b38ea3b4882ea4347f597aa0271272 https://hdl.handle.net/20.500.11820/2d479e5f-245c-4b81-be9c-74c101a61062 Zobrazit plný text záznamu