Monitoring the Health of Emerging Neural Network Accelerators with Cost-effective Concurrent Test.

Autor: Qi Liu, Tao Liu, Zihao Liu, Wujie Wen, Chengmo Yang
Předmět:
Zdroj: DAC: Annual ACM/IEEE Design Automation Conference; 2020, Issue 57, p776-781, 6p
Abstrakt: ReRAM-based neural network accelerator is a promising solution to handle the memory- and computation-intensive deep learning workloads. However, it suffers from unique device errors. These errors can accumulate to massive levels during the run time and cause significant accuracy drop. It is crucial to obtain its fault status in real-time before any proper repair mechanism can be applied. However, calibrating such statistical information is non-trivial because of the need of a large number of test patterns, long test time, and high test coverage considering that complex errors may appear in million-to-billion weight parameters. In this paper, we leverage the concept of corner data that can significantly confuse the decision making of neural network model, as well as the training algorithm, to generate only a small set of test patterns that is tuned to be sensitive to different levels of error accumulation and accuracy loss. Experimental results show that our method can quickly and correctly report the fault status of a running accelerator, outperforming existing solutions in both detection efficiency and cost. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index