A full featured configurable accelerator for object detection with YOLO
Autor: | Jose T. de Sousa, Daniel Pestana, Horácio C. Neto, Mário P. Véstias, Pedro R. Miranda, João D. Lopes, Rui Policarpo Duarte |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
General Computer Science
Computer science Object detection convolutional neural network Convolutional neural network 02 engineering and technology Kernel (linear algebra) 0202 electrical engineering electronic engineering information engineering General Materials Science lightweight YOLO Field-programmable gate array FPGA business.industry Lightweight YOLO General Engineering Folding (DSP implementation) Object (computer science) Frame rate 020202 computer hardware & architecture TK1-9971 Task (computing) Scalability 020201 artificial intelligence & image processing Electrical engineering. Electronics. Nuclear engineering business Computer hardware |
Zdroj: | Repositório Científico de Acesso Aberto de Portugal Repositório Científico de Acesso Aberto de Portugal (RCAAP) instacron:RCAAP IEEE Access, Vol 9, Pp 75864-75877 (2021) |
Popis: | Object detection and classification is an essential task of computer vision. A very efficient algorithm for detection and classification is YOLO (You Look Only Once). We consider hardware architectures to run YOLO in real-time on embedded platforms. Designing a new dedicated accelerator for each new version of YOLO is not feasible given the fast delivery of new versions. This work’s primary goal is to design a configurable and scalable core for creating specific object detection and classification systems based on YOLO, targeting embedded platforms. The core accelerates the execution of all the algorithm steps, including pre-processing, model inference and post-processing. It considers a fixed-point format, linearised activation functions, batch-normalisation, folding, and a hardware structure that exploits most of the available parallelism in CNN processing. The proposed core is configured for real-time execution of YOLOv3-Tiny and YOLOv4-Tiny, integrated into a RISC-V-based system-on-chip architecture and prototyped in an UltraScale XCKU040 FPGA (Field Programmable Gate Array). The solution achieves a performance of 32 and 31 frames per second for YOLOv3-Tiny and YOLOv4-Tiny, respectively, with a 16-bit fixed-point format. Compared to previous proposals, it improves the frame rate at a higher performance efficiency. The performance, area efficiency and configurability of the proposed core enable the fast development of real-time YOLO-based object detectors on embedded systems. |
Databáze: | OpenAIRE |
Externí odkaz: |