Helper threads via virtual multithreading on an experimental itanium ® 2 processor-based platform
Autor: | Terry Sych, Perry Wang, Jamison D. Collins, Bill Greene, Kai-Ming Chan, Stephen F. Moore, John Paul Shen, Dongkeun Kim, Aamir B. Yunus, Hong Wang |
---|---|
Rok vydání: | 2004 |
Předmět: |
Instruction prefetch
Speedup CPU cache Computer science Hyper-threading Thread (computing) Parallel computing General Medicine ComputerSystemsOrganization_PROCESSORARCHITECTURES Simultaneous multithreading computer.software_genre Computer Graphics and Computer-Aided Design Super-threading Multithreading Operating system General Earth and Planetary Sciences Itanium Compiler Cache computer Temporal multithreading Software General Environmental Science |
Zdroj: | ASPLOS |
ISSN: | 0163-5964 |
Popis: | Helper threading is a technology to accelerate a program by exploiting a processor's multithreading capability to run ``assist'' threads. Previous experiments on hyper-threaded processors have demonstrated significant speedups by using helper threads to prefetch hard-to-predict delinquent data accesses. In order to apply this technique to processors that do not have built-in hardware support for multithreading, we introduce virtual multithreading (VMT), a novel form of switch-on-event user-level multithreading, capable of fly-weight multiplexing of event-driven thread executions on a single processor without additional operating system support. The compiler plays a key role in minimizing synchronization cost by judiciously partitioning register usage among the user-level threads. The VMT approach makes it possible to launch dynamic helper thread instances in response to long-latency cache miss events, and to run helper threads in the shadow of cache misses when the main thread would be otherwise stalled.The concept of VMT is prototyped on an Itanium ® 2 processor using features provided by the Processor Abstraction Layer (PAL) firmware mechanism already present in currently shipping processors. On a 4-way MP physical system equipped with VMT-enabled Itanium 2 processors, helper threading via the VMT mechanism can achieve significant performance gains for a diverse set of real-world workloads, ranging from single-threaded workstation benchmarks to heavily multithreaded large scale decision support systems (DSS) using the IBM DB2 Universal Database. We measure a wall-clock speedup of 5.8% to 38.5% for the workstation benchmarks, and 5.0% to 12.7% on various queries in the DSS workload. |
Databáze: | OpenAIRE |
Externí odkaz: |