OR-ML: Enhancing Reliability for Machine Learning Accelerator with Opportunistic Redundancy

Autor:	Zhibin Yu, Wenxuan Chen, Zheng Wang, Bo Dong, Chao Chen, Yongkui Yang
Rok vydání:	2021
Předmět:	Fabric computing Artificial neural network business.industry Computer science Reliability (computer networking) Fault injection Machine learning computer.software_genre Parallel processing (DSP implementation) Redundancy (engineering) Overhead (computing) Artificial intelligence business Field-programmable gate array computer
Zdroj:	DATE
Popis:	Reliability plays a central role in deep sub-micron and nanometre IC fabrication technology and has recently been reported to be one of the key issues affecting the inference phase of neural networks. State-of-the-art machine learning (ML) accelerators exploit massively computing parallelism observed in neural networks to achieve high energy efficiency. The topology of ML engines' computing fabric, which constitutes large arrays of processing elements (PEs), has been increasing dramatically to incorporate the huge size and heterogeneity of the rapid evolving ML algorithm. However, it is commonly observed that activations of zero value lead to reduced PE utilization. In this work, we present a novel and low-cost approach to enhance the reliability of generic ML accelerators by Qpportunistically exploring the chances of runtime Redundancy provided by neighbouring PEs, named as OR-ML. In contrast to conventional redundancy techniques, the proposed technique introduces no additional computing resources, therefore significantly reduces the implementation overhead and achieves obvious level of protection. The design prototype is evaluated using emulated fault injection on FPGA, executing mainstream neural networks for objectionclassification and detection.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::10099bf48400156dd5476ab437baa351 https://doi.org/10.23919/date51398.2021.9474016 Zobrazit plný text záznamu