Autor: |
Joakim Linden, Erasmus Cedernaes, Masoud Daneshtalab, Håkan Forsberg, Emil Gustafsson Ek, Emil Tagebrand, Josef Haddad |
Rok vydání: |
2021 |
Předmět: |
|
Zdroj: |
2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC). |
DOI: |
10.1109/dasc52595.2021.9594400 |
Popis: |
In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. In the realm of safety-critical machine learning the used datasets need to reflect the environment in which the system is intended to operate, in order to minimize the generalization gap between trained and real-world inputs. Datasets should be thoroughly prepared and requirements on the properties and characteristics of the collected data need to be specified. In our work we present a case study in which generating a synthetic dataset is accomplished based on real-world flight data from the ADS-B system, containing thousands of approaches to several airports to identify real-world statistical distributions of relevant variables to vary within our dataset sampling space. We also investigate what the effects are of training a model on synthetic data to different extents, including training on translated image sets (using domain adaptation). Our results indicate airport location to be the most critical parameter to vary. We also conclude that all experiments did benefit in performance from pre-training on synthetic data rather than using only real data, however this did not hold true in general for domain adaptation-translated images. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|