PULP-TrainLib: Enabling On-Device Training for RISC-V Multi-core MCUs Through Performance-Driven Autotuning
Autor: | Davide Nadalini, Manuele Rusci, Giuseppe Tagliavini, Leonardo Ravaglia, Luca Benini, Francesco Conti |
---|---|
Přispěvatelé: | Nadalini D., Rusci M., Tagliavini G., Ravaglia L., Benini L., Conti F. |
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: |
distributed systems
parallel processing systems engineering microprocessor chips computer system computer science distributed system artificial intelligence computer hardware distributed computer systems distributed computer system processors computer programming embedded system machine learning computer networks computer systems embedded systems internet signal processing microprocessor chip parallel processing system processor computer network |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783031150739 |
Popis: | An open challenge in making Internet-of-Things sensor nodes "smart'' and self-adaptive is to enable on-chip Deep Neural Network (DNN) training on Ultra-Low-Power (ULP) microcontroller units (MCUs). To this aim, we present a framework, based on PULP-TrainLib, to deploy DNN training tasks on RISC-V-based Parallel-ULP (PULP) MCUs. PULP-TrainLib is a library of parallel software DNN primitives enabling the execution of forward and backward steps on PULP MCUs. To optimize PULP-TrainLib's kernels, we propose a strategy to automatically select and configure (autotune) the fastest among a set of tiling options and optimized floating-point matrix multiplication kernels, according to the tensor shapes of every DNN layer. Results on an 8-core RISC-V MCU show that our auto-tuned primitives improve MAC/clk by up to 2.4x compared to "one-size-fits-all'' matrix multiplication, achieving up to 4.39 MAC/clk - 36.6x better than a commercial STM32L4 MCU executing the same DNN layer training workload. Furthermore, our strategy proves to be 30.7x faster than AIfES, a state-of-the-art training library for MCUs, while training a complete TinyML model. |
Databáze: | OpenAIRE |
Externí odkaz: |