A fault detection and recovery architecture for a teradevice dataflow system
Autor: | Theo Ungerer, Avi Mendelson, Julian Wolf, Bernhard Fechner, Arne Garbade, Roberto Giorgi, Sebastian Weis |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2011 |
Předmět: |
010302 applied physics
business.industry Computer science Dataflow Process (computing) Fault tolerance 02 engineering and technology Parallel computing 01 natural sciences Fault detection and isolation 020202 computer hardware & architecture Instruction set Multithreading Embedded system 0103 physical sciences Synchronization (computer science) 0202 electrical engineering electronic engineering information engineering business Dataflow architecture |
Zdroj: | BASE-Bielefeld Academic Search Engine |
Popis: | Future computing systems (Teradevices) will probably contain more than 1000 cores on a single die. To exploit this parallelism, threaded dataflow execution models are promising, since they provide side-effect free execution and reduced synchronization overhead. But the terascale transistor integration of such chips make them orders of magnitude more vulnerable to voltage fluctuation, radiation, and process variations. This means reliability techniques have to be an essential part of such future systems, too.In this paper, we conceptualize a fault tolerant architecture for a scalable threadeddataflow system. We provide methods to detect permanent, intermittent, and transientfaults during the execution. Furthermore, we propose a recovery technique for dataflow threads. |
Databáze: | OpenAIRE |
Externí odkaz: |