CNN-on-AWS: Efficient allocation of multikernel applications on Multi-FPGA platforms

Autor:	Jordi Cortadella, Mario R. Casu, Mihai Teodor Lazarescu, Junnan Shan, Luciano Lavagno
Přispěvatelé:	Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals
Rok vydání:	2021
Předmět:	Optimization Computer science Allocation Multikernel Cloud computing 02 engineering and technology Parallel computing Convolutional neural network Neural networks (Computer science) Multi-FPGA Nonlinear programming 0202 electrical engineering electronic engineering information engineering Xarxes neuronals (Informàtica) Electrical and Electronic Engineering Field-programmable gate array Programació no lineal Pipelines business.industry Resource management Field programmable gate arrays CNNs Integer programming Solver Computer Graphics and Computer-Aided Design Throughput Kernel Data transfer Task analysis 020202 computer hardware & architecture Kernel (image processing) Informàtica::Informàtica teòrica [Àrees temàtiques de la UPC] Convolutional neural networks Programació en nombres enters business Software Data transmission
Zdroj:	UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC)
Popis:	Multi-FPGA platforms, like Amazon AWS F1, can run in the cloud multikernel pipelined applications, like convolutional neural networks (CNNs), with excellent performance and lower energy consumption than CPUs or GPUs. We propose a method to efficiently map these applications on multi-FPGA platforms to maximize the application throughput. Our methodology finds, for the given resources, the optimal number of parallel instances of each kernel in the pipeline and their allocation to one or more among the available FPGAs. We obtain this by formulating and solving a mixed-integer, nonlinear optimization problem, in which we model the performance of each component and the duration of the phases in which the accelerated computation can be split into, namely: 1) data transfer from a host CPU to the DDR memory of each FPGA; 2) data transfer from FPGA DDR to FPGA on-chip memory; 3) kernel computation on the FPGA; 4) data transfer from FPGA on-chip memory to FPGA DDR; and 5) data transfer from FPGA DDR to host. Finding the optimal solution using a mixed-integer nonlinear programming (MINLP) solver is often highly inefficient. Hence, we provide a fast heuristic method that according to our experiments can be much more efficient than the MINLP solver and finds comparable results. For larger problems (more CNN layers), our heuristic method can quickly find (several thousand times faster) much better solutions than the MINLP solver, even if we run the latter for a very long time. This work was supported in part by the European Commission through the ECOSCALE Project under Grant H2020-ICT671632, in part by the Spanish Ministry for Economy and Competitiveness and the European Union (FEDER funds) under Grant TIN2017-86727-C2-1-R and Grant FPI 2015, and in part by the Generalitat de Catalunya under Grant 2017 SGR 786.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::eb0c355b001085a23402b794ab438b2e http://hdl.handle.net/2117/345016 Zobrazit plný text záznamu