Review and analysis of synthetic dataset generation methods and techniques for application in computer vision

Autor: Goran Paulin, Marina Ivasic-Kos
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Popis: Synthetic datasets, for which we propose the term synthsets, are not a novelty but have become a necessity. Although they have been used in computer vision since 1989, helping to solve the problem of collecting a sufficient amount of annotated data for supervised machine learning, intensive development of methods and techniques for their generation belongs to the last decade. Nowadays, the question shifts from whether you should use synthetic datasets to how you should optimally create them. Motivated by the idea of discovering best practices for building synthetic datasets to represent dynamic environments (such as traffic, crowds, and sports), this study provides an overview of existing synthsets in the computer vision domain. We have analyzed the methods and techniques of synthetic datasets generation: from the first low-res generators to the latest generative adversarial training methods, and from the simple techniques for improving realism by adding global noise to those meant for solving domain and distribution gaps. The analysis extracts nine unique but potentially intertwined methods and reveals the synthsets generation diagram, consisting of 17 individual processes that synthset creators should follow and choose from, depending on the specific requirements of their task.
Databáze: OpenAIRE