Popis: |
The thick control flow (TCF) model is a data parallel abstraction of the thread model. It merges homogeneous threads (called fibers) flowing through the same control path to entities (called TCFs) with a single control flow and multiple data flows. Fibers of a TCF are executed synchronously with respect to each other and the number of them can be altered dynamically at runtime. Multiple TCFs can be executed in parallel to support control parallelism. In our previous work, we have outlined a special architecture, TPA (Thick control flow Processor Architecture), for executing TCF programs efficiently and shown that designing algorithms with the TCF model often leads to increased performance and simplified programs due to higher abstraction, eliminated loops and redundant program elements.Compute-update memory operations, such as multioperations and atomic instructions, are known to speed up parallel algorithms performing reductions and synchronizations. In this paper, we propose special compute-update memory operations for TCF processors to optimize iterative exclusive inter-fiber memory access patterns. Acceleration is achieved, e.g., in matrix addition and log-prefix style patterns in which multiple target locations can interchange data without reloads between the instructions that slows down execution. Our solution is based on modified active memory units and special memory operations that can send their reply value to another fiber than that initiating the access. We implement these operations in our TPA processor with a minimal HW cost and show that the expected speedups are achieved. Programming examples are given. |