Improving classification accuracy using data augmentation on small data sets
Autor: | José M. Jerez, Leonardo Franco, Francisco J. Moreno-Barea |
---|---|
Rok vydání: | 2020 |
Předmět: |
0209 industrial biotechnology
Artificial neural network Degree (graph theory) business.industry Computer science Deep learning General Engineering Pattern recognition 02 engineering and technology Type (model theory) Class (biology) Computer Science Applications Data set 020901 industrial engineering & automation Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Key (cryptography) 020201 artificial intelligence & image processing Artificial intelligence Element (category theory) business Representation (mathematics) |
Zdroj: | Expert Systems with Applications. 161:113696 |
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2020.113696 |
Popis: | Data augmentation (DA) is a key element in the success of Deep Learning (DL) models, as its use can lead to better prediction accuracy values when large size data sets are used. DA was not very much used with earlier neural network models before 2012, and the reason might be related to the type of models and the size of the data sets used. We investigate in this work, applying several state-of-the-art models based on Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), the effect of DA when using small size data sets, analyzing the results in terms of the prediction accuracy obtained according to the different characteristics of the training samples (number of instances and features, and class unbalance degree). We further introduce modifications to the standard methods used to generate the synthetic samples to alter the class balance representation, and the overall results indicate that with some computational effort a significant increase in prediction accuracy can be obtained when small data sets are considered. |
Databáze: | OpenAIRE |
Externí odkaz: |