Výsledky vyhledávání

Locality-Aware CTA Scheduling for Gaming Applications

Autor: Aditya Ukarande, Suryakant Patidar, Ram Rangan

Publikováno v: ACM Transactions on Architecture and Code Optimization. 19:1-26

The compute work rasterizer or the GigaThread Engine of a modern NVIDIA GPU focuses on maximizing compute work occupancy across all streaming multiprocessors in a GPU while retaining design simplicity. In this article, we identify the operational asp

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::69617ae69942d3ca68c4fe66bbacfad0
https://doi.org/10.1145/3477497

Zobrazit plný text záznamu

Cooperative Profile Guided Optimizations

Autor: Stephen W. Keckler, Mark Stephenson, Ram Rangan

Publikováno v: Computer Graphics Forum. 40:71-83

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::269933620ce0629dd1fc8aec57160752
https://doi.org/10.1111/cgf.14382

Zobrazit plný text záznamu

GPU Subwarp Interleaving

Autor: Sana Damani, Mark Stephenson, Ram Rangan, Daniel Johnson, Rishkul Kulkami, Stephen W. Keckler

Publikováno v: 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::f0dc05060cb4bc3485c5e7a595e212cc
https://doi.org/10.1109/hpca53966.2022.00090

Zobrazit plný text záznamu

Countering Load-to-Use Stalls in the NVIDIA Turing GPU

Autor: Alexandre Joly, Ram Rangan, Naman Turakhia

Publikováno v: IEEE Micro. 40:59-66

Among its various improvements over prior NVIDIA GPUs, the NVIDIA Turing GPU boasts of four key performance enhancements to effectively counter memory load-to-use stalls. First, reduced latency on L1 hits for global memory loads helps lower average m

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::7f251fa868919fdcedf768acf3f77cd9
https://doi.org/10.1109/mm.2020.3012514

Zobrazit plný text záznamu

Zeroploit

Autor: Virat Agarwal, Aditya Ukarande, Marc Blackstein, Mark Stephenson, Shyam Murthy, Ram Rangan

Publikováno v: ACM Transactions on Architecture and Code Optimization. 17:1-26

In this article, we first characterize register operand value locality in shader programs of modern gaming applications and observe that there is a high likelihood of one of the register operands of several multiply, logical-and, and similar operatio

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::f06229b205af521f3bd736d1e8de7699
https://doi.org/10.1145/3394284

Zobrazit plný text záznamu

PGZ: automatic zero-value code specialization

Autor: Mark Stephenson, Ram Rangan

Publikováno v: CC

In prior work we proposed Zeroploit, a transform that duplicates code, specializes one path assuming certain key program operands, called versioning variables, are zero, and leaves the other path unspecialized. Dynamically, depending on the versionin

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::f7185cc6fab5ea006eb17fbad13f02f3
https://doi.org/10.1145/3446804.3446845

Zobrazit plný text záznamu

Corrections to 'Countering Load-to-Use Stalls in the NVIDIA Turing GPU'

Autor: Alexandre Joly, Naman Turakhia, Ram Rangan

Publikováno v: IEEE Micro. 41:83-83

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::505f29be0f297862ede82e45d9a5ab23
https://doi.org/10.1109/mm.2020.3046136

Zobrazit plný text záznamu

Mesoscale performance simulation of multicore processor systems

Autor: Mike Kistler, Ram Rangan, Tibor Kiss, Peter Altevogt

Publikováno v: Software & Systems Modeling. 12:731-744

Modern microprocessor design relies heavily on detailed full-chip performance simulations to evaluate complex trade-offs. Typically, different design alternatives are tried out for a specific sub-system or component, while keeping the rest of the sys

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::b90a0b5aa0d883b0b09e5d9ee5cebb5f
https://doi.org/10.1007/s10270-012-0231-6

Zobrazit plný text záznamu

Performance scalability of decoupled software pipelining

Autor: David I. August, Ram Rangan, Guilherme Ottoni, Neil Vachharajani

Publikováno v: ACM Transactions on Architecture and Code Optimization. 5:1-25

Any successful solution to using multicore processors to scale general-purpose program performance will have to contend with rising intercore communication costs while exposing coarse-grained parallelism. Recently proposed pipelined multithreading (P

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::3d5240d8ea10f140d114aca4ed5f787a
https://doi.org/10.1145/1400112.1400113

Zobrazit plný text záznamu

Software-controlled fault tolerance

Autor: Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August, Shubhendu S. Mukherjee, George A. Reis

Publikováno v: ACM Transactions on Architecture and Code Optimization. 2:366-396

Traditional fault-tolerance techniques typically utilize resources ineffectively because they cannot adapt to the changing reliability and performance demands of a system. This paper proposes software-controlled fault tolerance, a concept allowing de

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::f32d044cd37e673fbc8b2c253f4d82fc
https://doi.org/10.1145/1113841.1113843

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání