High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors

Autor:	Anuj Pathania, Neeraj Goel, Gayathri Ananthanarayanan, Tulika Mitra, Siqi Wang, Yifan Zeng
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Multi-core processor Computer Science - Performance Edge device Computer science Design space exploration Pipeline (computing) 02 engineering and technology Parallel computing Computer Graphics and Computer-Aided Design Convolutional neural network Machine Learning (cs.LG) 020202 computer hardware & architecture Performance (cs.PF) Computer Science - Distributed Parallel and Cluster Computing 0202 electrical engineering electronic engineering information engineering Overhead (computing) Distributed Parallel and Cluster Computing (cs.DC) Enhanced Data Rates for GSM Evolution Electrical and Electronic Engineering Throughput (business) Software
Zdroj:	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39:2254-2267
ISSN:	1937-4151 0278-0070
Popis:	IoT Edge intelligence requires Convolutional Neural Network (CNN) inference to take place in the edge devices itself. ARM big.LITTLE architecture is at the heart of prevalent commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that enable power and performance trade-offs. All cores are expected to be simultaneously employed in inference to attain maximal throughput. However, high communication overhead involved in parallelization of computations from convolution kernels across clusters is detrimental to throughput. We present an alternative framework called Pipe-it that employs pipelined design to split convolutional layers across clusters while limiting parallelization of their respective kernels to the assigned cluster. We develop a performance-prediction model that utilizes only the convolutional layer descriptors to predict the execution time of each layer individually on all permitted core configurations (type and count). Pipe-it then exploits the predictions to create a balanced pipeline using an efficient design space exploration algorithm. Pipe-it on average results in a 39% higher throughput than the highest antecedent throughput. Comment: Accepted to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::88cb1a98c80899906e7e968a91ace14e https://doi.org/10.1109/tcad.2019.2944584 Zobrazit plný text záznamu