Výsledky vyhledávání - "Saumay Dublish"

Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs using Machine Learning

Autor: Nigel Topham, Vijay Nagarajan, Saumay Dublish

Publikováno v: Dublish, S, Nagarajan, V & Topham, N 2019, Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs using Machine Learning . in 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) . Institute of Electrical and Electronics Engineers (IEEE), Washington, DC, USA, pp. 492-505, 25th IEEE International Symposium on High-Performance Computer Architecture, Washington D.C., District of Columbia, United States, 16/02/19 . https://doi.org/10.1109/HPCA.2019.00061
HPCA

GPUs employ a high degree of thread-level parallelism (TLP) to hide the long latency of memory operations. However, the consequent increase in demand on the memory system causes pathological effects such as cache thrashing and bandwidth bottlenecks.

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::eb587e5e2a7c803d5e81cf70b90eff26
https://hdl.handle.net/20.500.11820/6e5d9bb6-1361-4d07-8317-fbe3429626d6

Zobrazit plný text záznamu

Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs

Autor: Nigel Topham, Saumay Dublish, Vijay Nagarajan

Publikováno v: Dublish, S, Nagarajan, V & Topham, N 2017, Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs . in 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) . Institute of Electrical and Electronics Engineers (IEEE), pp. 239-248, 2017 IEEE International Symposium on Performance Analysis of Systems and Software, Santa Rosa, United States, 24/04/17 . https://doi.org/10.1109/ISPASS.2017.7975295
ISPASS

GPUs are often limited by off-chip memory bandwidth. With the advent of general-purpose computing on GPUs, a cache hierarchy has been introduced to filter the bandwidth demand to the off-chip memory. However, the cache hierarchy presents its own band

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a7b38ea3b4882ea4347f597aa0271272
https://hdl.handle.net/20.500.11820/2d479e5f-245c-4b81-be9c-74c101a61062

Zobrazit plný text záznamu

Cooperative Caching for GPUs

Autor: Vijay Nagarajan, Nigel Topham, Saumay Dublish

Publikováno v: Dublish, S, Nagarajan, V & Topham, N 2016, ' Cooperative Caching for GPUs ', ACM Transactions on Architecture and Code Optimization, vol. 13, no. 4, 39, pp. 1-25 . https://doi.org/10.1145/3001589

The rise of general-purpose computing on GPUs has influenced architectural innovation on them. The introduction of an on-chip cache hierarchy is one such innovation. High L1 miss rates on GPUs, however, indicate inefficient cache usage due to myriad

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3b1a6e0dec93e95f86a2ec795bebfa19
https://www.pure.ed.ac.uk/ws/files/29959329/taco16_dublish_PURE_1.pdf

Zobrazit plný text záznamu

Characterizing memory bottlenecks in GPGPU workloads

Autor: Vijay Nagarajan, Nigel Topham, Saumay Dublish

Publikováno v: Dublish, S, Nagarajan, V & Topham, N 2016, Characterizing memory bottlenecks in GPGPU workloads . in 2016 IEEE International Symposium on Workload Characterization (IISWC) . Institute of Electrical and Electronics Engineers (IEEE), Providence, RI, USA, pp. 1-2, 2016 IEEE International Symposium on Workload Characterization, Providence, United States, 25/09/16 . https://doi.org/10.1109/IISWC.2016.7581287
IISWC

GPUs are often limited by the off-chip memory bandwidth. With the advent of general-purpose computing on GPUs, cache hierarchy has been introduced to filter the bandwidth demand to the off-chip memory. However, the cache hierarchy presents its own ba

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6f46d50079f754d7e79f7450f125f08f
https://www.pure.ed.ac.uk/ws/files/28417670/103_Dublish_PID4414159_1.pdf

Zobrazit plný text záznamu

Student Research Poster

Autor: Saumay Dublish

Publikováno v: PACT

Due to lack of sufficient compute threads in memory-intensive applications, GPUs often exhaust all the active warps and therefore, the memory latencies get exposed and appear in the critical path. In such a scenario, the shared on-chip and off-chip m

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::3557945ed131e7c58cc2141feea78835
https://doi.org/10.1145/2967938.2971470

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání