A role-changeable fault-tolerant management strategy towards resilient NoC-based manycore systems

Autor: Yu Lu, Jinxiang Wang, Fangfa Fu, Zixu Wu
Rok vydání: 2015
Předmět:
Zdroj: Microelectronics Journal. 46:1371-1379
ISSN: 0026-2692
DOI: 10.1016/j.mejo.2015.09.010
Popis: In hierarchically managed network-on-chip (NoC) based manycore systems with over one thousand processor cores, resource management is critical to efficient operation of the whole system. Meanwhile, the fault-tolerant design of the management structure is as important as the fault-tolerant design in the micro-structural level and the system level, which should require more attention towards constructing resilient manycore systems. This paper presents RCFTM, a hierarchical agent-based role-changeable fault-tolerant management strategy based on the modified role-changeable management framework for resilient NoC-based manycore systems. It is a distributed and adaptive strategy which can reconstruct the management hierarchy dynamically and it is capable of providing an intrinsic ability to tolerate various damages to the management hierarchy due to permanent core faults. The fault-tolerant capability of the RCFTM strategy is first evaluated with MATLAB for a large scale manycore system. Results show that the RCFTM strategy has better fault-tolerant capability than the conventional replacement strategy which relies only on backup cores. Then, the RCFTM strategy is implemented in C on a 5×5 NoC-based manycore system, where a full-system simulation platform is utilized. Experiments show that, with 20K bytes ROM and 35.6K bytes RAM footprint for the RCFTM strategy on each agent, the system can successfully tolerate both single and multiple core faults. Results also show that the RCFTM strategy only introduces less than 1.48% computing overhead of a working agent while the system is fault-free.
Databáze: OpenAIRE