Trade-Off Between Performance, Fault Tolerance and Energy Consumption in Duplication-Based Taskgraph Scheduling

Autor: Simon Holmbacka, Jörg Keller, Patrick Eitschberger
Rok vydání: 2018
Předmět:
Zdroj: Lecture Notes in Computer Science ISBN: 9783319776095
ARCS
DOI: 10.1007/978-3-319-77610-1_1
Popis: Fault tolerance in parallel systems can be achieved by duplicating task executions onto several processing units, so in case one processing unit (PU) fails, the task can continue executing on another unit. Duplicating task execution affects the performance of the system in fault-free and fault cases, and its energy consumption. Currently, there are no tools for properly handling the three-variable optimization problem: Performance \(\leftrightarrow \) Fault Tolerance \(\leftrightarrow \) Energy Consumption, and no facilities for integrating it into an actual system. We present a fault-tolerant runtime system (called RUPS) for user defined schedules, in which the user can give their preferences about the trade-off between performance, energy and fault tolerance. We present an approach for determining the best trade-off for modern multicore architectures and we test RUPS on a real system to verify the accuracy of our approach itself.
Databáze: OpenAIRE