Abstrakt: |
Convolutional Neural Networks (CNNs) are well-studied and commonly used for the problem of object detection thanks to their increased accuracy. However, high accuracy on its own says little about the effective performance of CNN-based models, especially when real-time detection tasks are involved. To the best of our knowledge, there has not been sufficient evaluation of the available methods in terms of their speed/accuracy trade-off. This work performs a review and hands-on evaluation of the most fundamental object detection models on the Common Objects in Context (COCO) dataset with respect to this trade-off, their memory footprint, and computational and storage costs. In addition, we review available datasets for medical mask detection and train YOLOv5 on the Properly Wearing Masked Faces Dataset (PWMFD). Next, we test and evaluate a set of specific optimization techniques, transfer learning, data augmentations, and attention mechanisms, and we report on their effect for real-time mask detection. Based on our findings, we propose an optimized model based on YOLOv5s using transfer learning for the detection of correctly and incorrectly worn medical masks that surpassed more than two times in speed (69 frames per second) the state-of-the-art model SE-YOLOv3 on the PWMFD while maintaining the same level of mean Average Precision (67%). [ABSTRACT FROM AUTHOR] |