OR-ML: Enhancing Reliability for Machine Learning Accelerator with Opportunistic Redundancy
Autor: | Zhibin Yu, Wenxuan Chen, Zheng Wang, Bo Dong, Chao Chen, Yongkui Yang |
---|---|
Rok vydání: | 2021 |
Předmět: |
Fabric computing
Artificial neural network business.industry Computer science Reliability (computer networking) Fault injection Machine learning computer.software_genre Parallel processing (DSP implementation) Redundancy (engineering) Overhead (computing) Artificial intelligence business Field-programmable gate array computer |
Zdroj: | DATE |
Popis: | Reliability plays a central role in deep sub-micron and nanometre IC fabrication technology and has recently been reported to be one of the key issues affecting the inference phase of neural networks. State-of-the-art machine learning (ML) accelerators exploit massively computing parallelism observed in neural networks to achieve high energy efficiency. The topology of ML engines' computing fabric, which constitutes large arrays of processing elements (PEs), has been increasing dramatically to incorporate the huge size and heterogeneity of the rapid evolving ML algorithm. However, it is commonly observed that activations of zero value lead to reduced PE utilization. In this work, we present a novel and low-cost approach to enhance the reliability of generic ML accelerators by Qpportunistically exploring the chances of runtime Redundancy provided by neighbouring PEs, named as OR-ML. In contrast to conventional redundancy techniques, the proposed technique introduces no additional computing resources, therefore significantly reduces the implementation overhead and achieves obvious level of protection. The design prototype is evaluated using emulated fault injection on FPGA, executing mainstream neural networks for objectionclassification and detection. |
Databáze: | OpenAIRE |
Externí odkaz: |