Popis: |
Scalability of applications is a key requirement to gaining performance in hybrid and cluster computing. Implementing code to utilize multiple accelerators and CPUs is difficult, particularly when dealing with dependencies, memory management, data locality, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) is designed to increase programmer productivity to develop applications for single nodes with multiple CPUs and accelerators. Current task graph schedulers provide APIs, directives, and compilers to schedule work on nodes; however, many fail to expose the locality of data and often use a single address space to represent memory resulting in inefficient data transfer patterns for accelerators. HTGS merges dataflow and traditional task graph schedulers into a novel model to assist developers in exposing the parallelism and data locality of their algorithm. With the HTGS model, an algorithm is represented at a high level of abstraction and modularizes the computationally intensive components as a series of concurrent tasks. Using this approach, the model explicitly defines memory for each address space and provides interfaces to express the locality of data between tasks. The result achieves the full performance of the node comparable to the best of breed implementations of algorithms such as matrix multiplication and LU decomposition. The performance gains are demonstrated with a modest effort using the HTGS C++ API, which improves programmer productivity with obtaining that performance. |