Popis: |
A general object detector has a high misdetection rate for small objects. Although many small-object detectors consider the insufficient representation of objects, their performance in detecting very tiny objects with a strong similarity to other objects and backgrounds in aerial images remains poor. In this study, we analyze the misalignments of spatial and semantic information of features due to resizing, involving interpolation and pooling operations conducted before multi-scale feature fusion. Additionally, as a learning target, the objectness loss uses IoU values, which are sensitive to the minute distance differences between predicted small objects in the detector and the ground-truth data. Therefore, the neck and head architecture of the proposed You Only Look Once version 7 Fusion (YOLOv7F) model is redesigned to be suitable for small-object detection. The YOLOv7F model includes the Deformable Feature Fusion (DFF) module, which aligns the features based on the guided features, and the Objectness Refinement Head (ORH) model, which refines the predicted objectness score. The YOLOv7F model achieved 63.9% $mAP_{0.5}$ performance and led to a 4.1% improvement compared to the YOLOv7X model on the AI-TODv2, where small objects account for 98.1% of the all instances. In the VisDrone2019-DET dataset, where 32.0% of instances are larger than a medium-sized object, YOLOv7F model achieved an $mAP_{0.5}$ of 63.9%, a 2.0% improvement compared to the YOLOv7X model. |