Generating self-labeled geological datasets for semantic segmentation using pretrained GANs

Autor: Ivan Ferreira, Ardiansyah Koeshidayatullah
Rok vydání: 2023
DOI: 10.5194/egusphere-egu23-888
Popis: Recent advancement in deep generative models (GANs) has brought the attention of many researchers to explore the feasibility of using realistic synthetic data as (i) a digital twin of the original dataset and (ii) a new approach to augment the original dataset. Previous works highlighted that GANs can replicate both esthetical and statistical characteristics of datasets, up to the point of being indistinguishable from real samples, even when being examined by domain experts. In addition, the weights learned during the unsupervised training of these generative models are useful to further extract specific features of interest from the given dataset. In geosciences, many computer vision tasks are related to semantic segmentation, from pore quantification to fossil characterization. In such a task, the labeling process becomes the main limiting point, being both time-consuming and requiring domain experts. Hence, in this study, we repurpose GANs to obtain self-labeled geological datasets for semantic segmentation to be readily applicable in geological machine learning workflows. In this work, we used trained style-based GANs of foraminifera specimens, ooids, and mudstones. Our experiments show that with one or a few labels, we can successfully generate self-labeled and synthetic datasets featuring the labels of interest. This achievement is pivotal in geosciences in exploring the idea of GANs for one-shot and few-shot segmentation and in minimizing the efforts of manual labeling for segmentation requiring domain experts.
Databáze: OpenAIRE