Zobrazeno 1 - 10
of 15
pro vyhledávání: '"Ram Rangan"'
Publikováno v:
ACM Transactions on Architecture and Code Optimization. 19:1-26
The compute work rasterizer or the GigaThread Engine of a modern NVIDIA GPU focuses on maximizing compute work occupancy across all streaming multiprocessors in a GPU while retaining design simplicity. In this article, we identify the operational asp
Publikováno v:
Computer Graphics Forum. 40:71-83
Autor:
Sana Damani, Mark Stephenson, Ram Rangan, Daniel Johnson, Rishkul Kulkami, Stephen W. Keckler
Publikováno v:
2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
Publikováno v:
IEEE Micro. 40:59-66
Among its various improvements over prior NVIDIA GPUs, the NVIDIA Turing GPU boasts of four key performance enhancements to effectively counter memory load-to-use stalls. First, reduced latency on L1 hits for global memory loads helps lower average m
Publikováno v:
ACM Transactions on Architecture and Code Optimization. 17:1-26
In this article, we first characterize register operand value locality in shader programs of modern gaming applications and observe that there is a high likelihood of one of the register operands of several multiply, logical-and, and similar operatio
Autor:
Mark Stephenson, Ram Rangan
Publikováno v:
CC
In prior work we proposed Zeroploit, a transform that duplicates code, specializes one path assuming certain key program operands, called versioning variables, are zero, and leaves the other path unspecialized. Dynamically, depending on the versionin
Publikováno v:
IEEE Micro. 41:83-83
Publikováno v:
Software & Systems Modeling. 12:731-744
Modern microprocessor design relies heavily on detailed full-chip performance simulations to evaluate complex trade-offs. Typically, different design alternatives are tried out for a specific sub-system or component, while keeping the rest of the sys
Publikováno v:
ACM Transactions on Architecture and Code Optimization. 5:1-25
Any successful solution to using multicore processors to scale general-purpose program performance will have to contend with rising intercore communication costs while exposing coarse-grained parallelism. Recently proposed pipelined multithreading (P
Autor:
Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August, Shubhendu S. Mukherjee, George A. Reis
Publikováno v:
ACM Transactions on Architecture and Code Optimization. 2:366-396
Traditional fault-tolerance techniques typically utilize resources ineffectively because they cannot adapt to the changing reliability and performance demands of a system. This paper proposes software-controlled fault tolerance, a concept allowing de