Zeroploit
Autor: | Virat Agarwal, Aditya Ukarande, Marc Blackstein, Mark Stephenson, Shyam Murthy, Ram Rangan |
---|---|
Rok vydání: | 2020 |
Předmět: |
Speedup
Computer science 02 engineering and technology Parallel computing Fast path Program optimization Operand 020202 computer hardware & architecture Hardware and Architecture Path (graph theory) 0202 electrical engineering electronic engineering information engineering Code (cryptography) Profile-guided optimization 020201 artificial intelligence & image processing Shader Software Information Systems |
Zdroj: | ACM Transactions on Architecture and Code Optimization. 17:1-26 |
ISSN: | 1544-3973 1544-3566 |
DOI: | 10.1145/3394284 |
Popis: | In this article, we first characterize register operand value locality in shader programs of modern gaming applications and observe that there is a high likelihood of one of the register operands of several multiply, logical-and, and similar operations being zero, dynamically. We provide intuition, examples, and a quantitative characterization for how zeros originate dynamically in these programs. Next, we show that this dynamic behavior can be gainfully exploited with a profile-guided code optimization called Zeroploit that transforms targeted code regions into a zero-(value-)specialized fast path and a default slow path. The fast path benefits from zero-specialization in two ways, namely: (a) the backward slice of the other operand of a given multiply or logical-and can be skipped dynamically, provided the only use of that other operand is in the given instruction, and (b) the forward slice of instructions originating at the given instruction can be zero-specialized, potentially triggering further backward slice specializations from operations of that forward slice as well. Such specialization helps the fast path avoid redundant dynamic computations as well as memory fetches, while the fast-slow versioning transform helps preserve functional correctness. With an offline value profiler and manually optimized shader programs, we demonstrate that Zeroploit is able to achieve an average speedup of 35.8% for targeted shader programs, amounting to an average frame-rate speedup of 2.8% across a collection of modern gaming applications on an NVIDIA® GeForce RTX™ 2080 GPU. |
Databáze: | OpenAIRE |
Externí odkaz: |