Image-based taxonomic classification of bulk insect biodiversity samples using deep learning and domain adaptation

Autor: Tomochika Fujisawa, Emmanouil Meramveliotakis, Anna Papadopoulou, Alfried Vogler, Víctor Noguerales
Přispěvatelé: European Commission, Ministerio de Ciencia e Innovación (España), Agencia Estatal de Investigación (España), JSPS KAKENHI
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Popis: Complex bulk samples of insects from biodiversity surveys present a challenge for taxonomic identification, which could be overcome by high-throughput imaging combined with machine learning for rapid classification of specimens. These procedures require that taxonomic labels from an existing source data set are used formodel training and prediction of an unknown target sample. However, such transfer learningmay be problematic for the study of newsamples not previously encountered in an image set, for example, from unexplored ecosystems, and require methods of domain adaptation that reduce the differences in the feature distribution of the source and target domains (training and test sets).We assessed the efficiency of domain adaptation for family-level classification of bulk samples of Coleoptera, as a critical first step in the characterization of biodiversity samples. Neural networkmodels trained with images from a global database of Coleoptera were applied to a biodiversity sample from understudied forests in Cyprus as the target. Within-dataset classification accuracy reached 98% and depended on the number and quality of training images, and on dataset complexity. The accuracy of between-datasets predictions (across disparate source–target pairs that do not share any species or genera) was at most 82% and depended greatly on the standardization of the imaging procedure. An algorithm for domain adaptation, domain adversarial training of neural networks (DANN), significantly improved the prediction performance of models trained by non-standardized, low-quality images. Our findings demonstrate that existing databases can be used to train models and successfully classify images from unexplored biota, but the imaging conditions and classification algorithms need careful consideration.
This work was supported by the iBioGen project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 810729. We are grateful to Richard Turney and Thomas J. Creedy (Natural History Museum, London) for advice and support during bulk-sample imaging, and Takashi Imai (Shiga University) for helpful advice on deep learning methods. We also thank three anonymous referees for their constructive and valuable comments on an earlier version of the manuscript. We would like to extend our gratitude to Andreas Dimitriou for help during sample imaging, and Konstantinos Ntatsopoulos for support in the taxonomy of Cyprus beetles. Víctor Noguerales was supported by a postdoctoral contract under the iBioGen project and a “Juan de la Cierva-Formación” postdoctoral fellowship (grant: FJC2018-035611-I) funded by MCIN/AEI/10.13039/501100011033. Tomochika Fujisawa was supported by JSPS KAKENHI (grant number: 20K06824).
Databáze: OpenAIRE