Revisiting DETR Pre-training for Object Detection

Autor:	Ma, Yan, Liang, Weicong, Chen, Bohan, Hao, Yiduo, Hou, Bojian, Yue, Xiangyu, Zhang, Chao, Yuan, Yuhui
Rok vydání:	2023
Předmět:	Computer Science - Computer Vision and Pattern Recognition
Druh dokumentu:	Working Paper
Popis:	Motivated by the remarkable achievements of DETR-based approaches on COCO object detection and segmentation benchmarks, recent endeavors have been directed towards elevating their performance through self-supervised pre-training of Transformers while preserving a frozen backbone. Noteworthy advancements in accuracy have been documented in certain studies. Our investigation delved deeply into a representative approach, DETReg, and its performance assessment in the context of emerging models like $\mathcal{H}$-Deformable-DETR. Regrettably, DETReg proves inadequate in enhancing the performance of robust DETR-based models under full data conditions. To dissect the underlying causes, we conduct extensive experiments on COCO and PASCAL VOC probing elements such as the selection of pre-training datasets and strategies for pre-training target generation. By contrast, we employ an optimized approach named Simple Self-training which leads to marked enhancements through the combination of an improved box predictor and the Objects$365$ benchmark. The culmination of these endeavors results in a remarkable AP score of $59.3\%$ on the COCO val set, outperforming $\mathcal{H}$-Deformable-DETR + Swin-L without pre-training by $1.4\%$. Moreover, a series of synthetic pre-training datasets, generated by merging contemporary image-to-text(LLaVA) and text-to-image (SDXL) models, significantly amplifies object detection capabilities.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2308.01300 Zobrazit plný text záznamu View this record from Arxiv