DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs

Autor: Angelo Garofalo, Giuseppe Tagliavini, Nazareno Bruschi, Alessio Burrello, Francesco Conti, Davide Rossi
Přispěvatelé: Burrello, Alessio, Garofalo, Angelo, Bruschi, Nazareno, Tagliavini, Giuseppe, Rossi, Davide, Conti, Francesco
Rok vydání: 2021
Předmět:
FOS: Computer and information sciences
IoT
Micromechanical devices
Computer science
Acceleration
02 engineering and technology
01 natural sciences
Deep neural networks
IoT
edge computing
DNN acceleration

Theoretical Computer Science
Tools
DNN acceleration
edge computing
Deep neural networks
Hardware Architecture (cs.AR)
Dory
Constraint programming
System on a chip
Computer architecture
Neural and Evolutionary Computing (cs.NE)
Static random-access memory
Computer Science - Hardware Architecture
Edge computing
computer.programming_language
ANSI C
Memory hierarchy
biology
business.industry
010401 analytical chemistry
Computer Science - Neural and Evolutionary Computing
021001 nanoscience & nanotechnology
biology.organism_classification
0104 chemical sciences
Memory management
Computer Science - Distributed
Parallel
and Cluster Computing

Computational Theory and Mathematics
Hardware and Architecture
Embedded system
Task analysis
Distributed
Parallel
and Cluster Computing (cs.DC)

0210 nano-technology
business
computer
Software
System-on-chip
Zdroj: IEEE Transactions on Computers
DOI: 10.1109/tc.2021.3066883
Popis: The deployment of Deep Neural Networks (DNNs) on end-nodes at the extreme edge of the Internet-of-Things is a critical enabler to support pervasive Deep Learning-enhanced applications. Low-Cost MCU-based end-nodes have limited on-chip memory and often replace caches with scratchpads, to reduce area overheads and increase energy efficiency -- requiring explicit DMA-based memory transfers between different levels of the memory hierarchy. Mapping modern DNNs on these systems requires aggressive topology-dependent tiling and double-buffering. In this work, we propose DORY (Deployment Oriented to memoRY) - an automatic tool to deploy DNNs on low cost MCUs with typically less than 1MB of on-chip SRAM memory. DORY abstracts tiling as a Constraint Programming (CP) problem: it maximizes L1 memory utilization under the topological constraints imposed by each DNN layer. Then, it generates ANSI C code to orchestrate off- and on-chip transfers and computation phases. Furthermore, to maximize speed, DORY augments the CP formulation with heuristics promoting performance-effective tile sizes. As a case study for DORY, we target GreenWaves Technologies GAP8, one of the most advanced parallel ultra-low power MCU-class devices on the market. On this device, DORY achieves up to 2.5x better MAC/cycle than the GreenWaves proprietary software solution and 18.1x better than the state-of-the-art result on an STM32-F746 MCU on single layers. Using our tool, GAP-8 can perform end-to-end inference of a 1.0-MobileNet-128 network consuming just 63 pJ/MAC on average @ 4.3 fps - 15.4x better than an STM32-F746. We release all our developments - the DORY framework, the optimized backend kernels, and the related heuristics - as open-source software.
14 pages, 12 figures, 4 tables, 2 listings. Accepted for publication in IEEE Transactions on Computers (https://ieeexplore.ieee.org/document/9381618)
Databáze: OpenAIRE