ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples

Autor: Yukun Du, Yitao Cai, Xiao Jin, Hongxia Wang, Yao Li, Min Lu
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Mathematics, Vol 11, Iss 18, p 3891 (2023)
Druh dokumentu: article
ISSN: 2227-7390
DOI: 10.3390/math11183891
Popis: Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and unknown noise. Thus, in this paper we propose a data synthesis method named Adaptive Subspace Interpolation for Data Synthesis (ASIDS). The idea is to divide the original data feature space into several subspaces with an equal number of data points, and then perform interpolation on the data points in the adjacent subspaces. This method can adaptively adjust the sample size of the synthetic dataset that contains unknown noise, and the generated sample data typically contain minimal errors. Moreover, it adjusts the feature composition of the data points, which can significantly reduce the proportion of the data points with large fitting errors. Furthermore, the hyperparameters of this method have an intuitive interpretation and usually require little calibration. Analysis results obtained using simulated original data and benchmark original datasets demonstrate that ASIDS is a robust and stable method for data synthesis.
Databáze: Directory of Open Access Journals
Nepřihlášeným uživatelům se plný text nezobrazuje