More Machine Learning for Less: Comparing Data Generation Strategies in Mechanical Engineering and Manufacturing
Autor: | Philipp Noodt, Alexia Fenollar Solvay, Tobias Meisen, Johannes Lipp, Vladimir Samsonov |
---|---|
Rok vydání: | 2019 |
Předmět: |
0209 industrial biotechnology
Adaptive sampling Data collection business.industry Active learning (machine learning) Computer science Test data generation Design of experiments Stability (learning theory) Mechanical engineering Sample (statistics) 02 engineering and technology Machine learning computer.software_genre 020901 industrial engineering & automation 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Use case Artificial intelligence business computer |
Zdroj: | SSCI |
DOI: | 10.1109/ssci44817.2019.9002663 |
Popis: | Supervised Machine Learning (ML) models require extensive training data to properly approximate the behavior of complex mechanical processes and systems. Real-world experiments or adequate simulations are expensive, time-consuming or incident-related and make the efficient acquisition of sample data a compelling necessity. In mechanical engineering and manufacturing, data is usually collected via established Design of Experiments (DOE) methods. At the same time, the topic of Active Learning (AL) is gaining in importance in the research community and promises a reduction in the amount of data, but is rarely used in industry.In this paper, we compare the most common data sampling methods with AL to achieve better predictive results with fewer samples on regression tasks. We propose a novel evaluation framework that allows to compare various sampling methods in a controlled and unbiased manner, regardless of their different requirements. Using three exemplary use cases (UCs), we evaluate when one should use AL or DOE methods for the task of data generation, by looking at the sample efficiency, stability and predictive accuracy of the resulting ML models. This paper provides practical guidance to both engineers and data scientists, who required highly efficient data collection for later use of ML. |
Databáze: | OpenAIRE |
Externí odkaz: |