SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification

Autor: Adam, Lukáš, Čermák, Vojtěch, Papafitsoros, Kostas, Picek, Lukáš
Rok vydání: 2022
Předmět:
Zdroj: Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7146-7156
Druh dokumentu: Working Paper
Popis: This paper introduces the first public large-scale, long-span dataset with sea turtle photographs captured in the wild -- \href{https://www.kaggle.com/datasets/wildlifedatasets/seaturtleid2022}{SeaTurtleID2022}. The dataset contains 8729 photographs of 438 unique individuals collected within 13 years, making it the longest-spanned dataset for animal re-identification. All photographs include various annotations, e.g., identity, encounter timestamp, and body parts segmentation masks. Instead of standard "random" splits, the dataset allows for two realistic and ecologically motivated splits: (i) a \textit{time-aware closed-set} with training, validation, and test data from different days/years, and (ii) a \textit{time-aware open-set} with new unknown individuals in test and validation sets. We show that time-aware splits are essential for benchmarking re-identification methods, as random splits lead to performance overestimation. Furthermore, a baseline instance segmentation and re-identification performance over various body parts is provided. Finally, an end-to-end system for sea turtle re-identification is proposed and evaluated. The proposed system based on Hybrid Task Cascade for head instance segmentation and ArcFace-trained feature-extractor achieved an accuracy of 86.8\%.
Comment: The SeaTurtleID2022 dataset is the latest version of the SeaTurtleID dataset which was described in the previous versions of this arXiv submission. Notice the change of title in the latest version
Databáze: arXiv