Exascale fault tolerance challenge and approaches

Autor: Cameron B. Mcnairy
Rok vydání: 2018
Předmět:
Zdroj: IRPS
DOI: 10.1109/irps.2018.8353563
Popis: A geometrically increasing transistor count and a stagnant fault/transistor profile create a challenge in delivering a minimum acceptable user experience for the Exascale capable supercomputer. This situation propels fault-tolerant design priorities from the back ground to the foreground. Supercomputer fault tolerance must be a first class design concern for Exascale and beyond systems. Myriad solutions exist and can touch each level of the system from transistor, to circuit, micro-architecture, architecture, OS/Driver/Library, and application. Tools and methodologies to support this global effort with sufficient precision to enable trade-offs against and optimizations around power and performance are required with accuracy targeting at least 5 years into the future.
Databáze: OpenAIRE