Latency tolerance for throughput computing

Autor: Chien-Ping Lu, Brian Ko
Rok vydání: 2012
Předmět:
Zdroj: Proceedings of the International Conference on Computer-Aided Design.
DOI: 10.1145/2429384.2429496
Popis: In Throughput Computing, the data can be processed independently with a substantial amount of threads running similar programs, referred to as kernels, or shaders for graphics specific workload. A Throughput Computing device, such as GPU, requires task latency tolerance to hold the context of the outstanding threads, and data latency tolerance to hold spaces for memory requests issued from the threads. The threads are grouped into thread groups. The register file and the associated number of outstanding thread groups should be sized according to the ratio of the computing resources to load/store units. Such a ratio should reflect the balance between ALU and load/store instructions of target workload.
Databáze: OpenAIRE