Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Saumay Dublish"'
Publikováno v:
Dublish, S, Nagarajan, V & Topham, N 2019, Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs using Machine Learning . in 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) . Institute of Electrical and Electronics Engineers (IEEE), Washington, DC, USA, pp. 492-505, 25th IEEE International Symposium on High-Performance Computer Architecture, Washington D.C., District of Columbia, United States, 16/02/19 . https://doi.org/10.1109/HPCA.2019.00061
HPCA
HPCA
GPUs employ a high degree of thread-level parallelism (TLP) to hide the long latency of memory operations. However, the consequent increase in demand on the memory system causes pathological effects such as cache thrashing and bandwidth bottlenecks.
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::eb587e5e2a7c803d5e81cf70b90eff26
https://hdl.handle.net/20.500.11820/6e5d9bb6-1361-4d07-8317-fbe3429626d6
https://hdl.handle.net/20.500.11820/6e5d9bb6-1361-4d07-8317-fbe3429626d6
Publikováno v:
Dublish, S, Nagarajan, V & Topham, N 2017, Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs . in 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) . Institute of Electrical and Electronics Engineers (IEEE), pp. 239-248, 2017 IEEE International Symposium on Performance Analysis of Systems and Software, Santa Rosa, United States, 24/04/17 . https://doi.org/10.1109/ISPASS.2017.7975295
ISPASS
ISPASS
GPUs are often limited by off-chip memory bandwidth. With the advent of general-purpose computing on GPUs, a cache hierarchy has been introduced to filter the bandwidth demand to the off-chip memory. However, the cache hierarchy presents its own band
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a7b38ea3b4882ea4347f597aa0271272
https://hdl.handle.net/20.500.11820/2d479e5f-245c-4b81-be9c-74c101a61062
https://hdl.handle.net/20.500.11820/2d479e5f-245c-4b81-be9c-74c101a61062
Publikováno v:
Dublish, S, Nagarajan, V & Topham, N 2016, ' Cooperative Caching for GPUs ', ACM Transactions on Architecture and Code Optimization, vol. 13, no. 4, 39, pp. 1-25 . https://doi.org/10.1145/3001589
The rise of general-purpose computing on GPUs has influenced architectural innovation on them. The introduction of an on-chip cache hierarchy is one such innovation. High L1 miss rates on GPUs, however, indicate inefficient cache usage due to myriad
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3b1a6e0dec93e95f86a2ec795bebfa19
https://www.pure.ed.ac.uk/ws/files/29959329/taco16_dublish_PURE_1.pdf
https://www.pure.ed.ac.uk/ws/files/29959329/taco16_dublish_PURE_1.pdf
Publikováno v:
Dublish, S, Nagarajan, V & Topham, N 2016, Characterizing memory bottlenecks in GPGPU workloads . in 2016 IEEE International Symposium on Workload Characterization (IISWC) . Institute of Electrical and Electronics Engineers (IEEE), Providence, RI, USA, pp. 1-2, 2016 IEEE International Symposium on Workload Characterization, Providence, United States, 25/09/16 . https://doi.org/10.1109/IISWC.2016.7581287
IISWC
IISWC
GPUs are often limited by the off-chip memory bandwidth. With the advent of general-purpose computing on GPUs, cache hierarchy has been introduced to filter the bandwidth demand to the off-chip memory. However, the cache hierarchy presents its own ba
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6f46d50079f754d7e79f7450f125f08f
https://www.pure.ed.ac.uk/ws/files/28417670/103_Dublish_PID4414159_1.pdf
https://www.pure.ed.ac.uk/ws/files/28417670/103_Dublish_PID4414159_1.pdf
Autor:
Saumay Dublish
Publikováno v:
PACT
Due to lack of sufficient compute threads in memory-intensive applications, GPUs often exhaust all the active warps and therefore, the memory latencies get exposed and appear in the critical path. In such a scenario, the shared on-chip and off-chip m