Popis: |
TrackNet, a deep learning network, was proposed to track high-speed and tiny objects such as tennis balls and shuttlecocks from videos. To conquer low image quality issues such as blur, afterimage, and short-term occlusion, some number of consecutive images are input together to detect an flying object. In this work, TrackNetV2 is proposed to improve the performance of TrackNet from various aspects, especially processing speed, prediction accuracy, and GPU memory usage. First of all, the processing speed is improved from 2.6 FPS to 31.8 FPS. The performance boost is achieved by reducing the input image size and re-engineering the network from a Multiple-In Single-Out (MISO) design to a Multiple-In Multiple-Out (MIMO) design. Then, to improve the prediction accuracy, a comprehensive dataset from diverse badminton match videos is collected and labeled for training and testing. The dataset consists of 55563 frames from 18 badminton match videos. In addition, the network mechanisms are composed of not only VGG16 and upsampling layers but also U-net. Last, to reduce GPU memory usage, the data structure of the heatmap layer is remodeled from a pixel-wise one-hot encoding 3D array to a real-valued 2D array. To reflect the change of the heatmap representation, the loss function is redesigned from a RMSE-based function to a weighted cross-entropy based function. An overall validation shows that the accuracy, precision and recall of TrackNetV2 respectively reach 96.3%, 97.0% and 98.7% in the training phase and 85.2%, 97.2% and 85.4% in a test on a brand new match. The processing speed of the 3-in and 3-out version TrackNetV2 can reach 31.84 FPS. The dataset and source code of this work are available at https://nol.cs.nctu.edu.tw:234/open-source/TrackNetv2/. |