EDSD: efficient driving scenes detection based on Swin Transformer.

Autor:	Chen, Wei, Zheng, Ruihan, Jiang, Jiade, Tian, Zijian, Zhang, Fan, Liu, Yi
Předmět:	CONVOLUTIONAL neural networks TRANSFORMER models OBJECT recognition (Computer vision) PEDESTRIANS DETECTORS
Zdroj:	Multimedia Tools & Applications; Nov2024, Vol. 83 Issue 39, p87179-87198, 20p
Abstrakt:	In the field of autonomous driving, the detection of targets such as vehicles, bicycles, and pedestrians in complex road conditions is of great importance. Through extensive experimentation, we have found that various vehicle targets generally occupy large sizes in the image but are easily occluded, while small targets such as pedestrians usually appear densely. The detection of targets of different sizes is an important challenge for the performance of current detectors. To address this issue, we proposed a novel hierarchical feature pyramid network structure. This structure comprises a series of CNN-Transformer variant layers, each of which is a superposition of CST neural network modules and Swin Transformer modules. In addition, considering that the huge computation of the global self-attention mechanism is difficult to be applied in the field of autonomous driving, we adopted the shifted window method in SwinFM, which effectively accelerates the inference process by replacing the traditional method by using the self-attention mechanism within the window. This study uses the Swin Transformer as a baseline. Compared to the baseline, our EDSD model improves the average accuracy by 1.8% and 3.1% on the BDD100K dataset and the KITTI dataset, respectively. [ABSTRACT FROM AUTHOR]
Databáze:	Complementary Index
Externí odkaz:	Zobrazit plný text záznamu Full text from SpringerLink