Exascale fault tolerance challenge and approaches
Autor: | Cameron B. Mcnairy |
---|---|
Rok vydání: | 2018 |
Předmět: |
business.industry
Computer science Reliability (computer networking) Transistor 020206 networking & telecommunications Fault tolerance 02 engineering and technology Fault (power engineering) Supercomputer law.invention User experience design Transistor count law Embedded system 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Architecture business |
Zdroj: | IRPS |
DOI: | 10.1109/irps.2018.8353563 |
Popis: | A geometrically increasing transistor count and a stagnant fault/transistor profile create a challenge in delivering a minimum acceptable user experience for the Exascale capable supercomputer. This situation propels fault-tolerant design priorities from the back ground to the foreground. Supercomputer fault tolerance must be a first class design concern for Exascale and beyond systems. Myriad solutions exist and can touch each level of the system from transistor, to circuit, micro-architecture, architecture, OS/Driver/Library, and application. Tools and methodologies to support this global effort with sufficient precision to enable trade-offs against and optimizations around power and performance are required with accuracy targeting at least 5 years into the future. |
Databáze: | OpenAIRE |
Externí odkaz: |