Popis: |
To address the challenges of object detection in complex remote sensing imagery, where the YOLO backbone network struggles with adaptive learning of feature distributions, leading to insufficient multi-scale feature learning capabilities and low detection accuracy for small and occluded objects, the lightweight Enhanced YOLOv8 with WBiFPN (Weighted Bidirectional Feature Pyramid Network) model is introduced in this paper. This model is designed to enhance multi-scale feature learning performance. It incorporates a feature fusion network based on WBiFPN and introduces the EMA (Efficient Multi-Scale Attention Module) to strengthen the representation of semantic and spatial information, thereby deepening the integration of multi-scale features. The model integrates RepConv (Re-parameterized Convolution) and ConvNeXt C2f in the shallow layers of the backbone network to optimize feature extraction, while the deeper layers include a BoT (Bottleneck Transformer Model) to further enhance multi-scale feature extraction capabilities. To reduce model parameters and computational complexity, the neck network employs a simplified Slim-Neck structure. Experimental results demonstrate that the Enhanced YOLOv8 model exhibits superior performance. Compared to the YOLOv8-n/s/m/l/x series models, the proposed model achieves mean Average Precision (mAP@0.5) of 94.8%, 91.6%, and 82.0% on the NWPU VHR-10, DIOR, and DOTA datasets, respectively, representing improvements of 3.2%, 2.5%, and 2.5%. The average inference speeds are 82 fps, 79 fps, and 76 fps, meeting the real-time requirements of inference. Furthermore, the Enhanced YOLOv8 model outperforms other mainstream models in detection performance. |