OCT-GAN: Neural ODE-based Conditional Tabular GANs
Autor: | Jinsung Jeon, Jae-Hoon Lee, Jayoung Kim, Noseong Park, Jihyeon Hyeong |
---|---|
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Discriminator Computer science Concatenation Ode 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences Synthetic data Machine Learning (cs.LG) Ordinary differential equation 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Node (circuits) Data mining Cluster analysis computer 0105 earth and related environmental sciences Generator (mathematics) |
Zdroj: | WWW |
DOI: | 10.1145/3442381.3449999 |
Popis: | Synthesizing tabular data is attracting much attention these days for various purposes. With sophisticate synthetic data, for instance, one can augment its training data. For the past couple of years, tabular data synthesis techniques have been greatly improved. Recent work made progress to address many problems in synthesizing tabular data, such as the imbalanced distribution and multimodality problems. However, the data utility of state-of-the-art methods is not satisfactory yet. In this work, we significantly improve the utility by designing our generator and discriminator based on neural ordinary differential equations (NODEs). After showing that NODEs have theoretically preferred characteristics for generating tabular data, we introduce our designs. The NODE-based discriminator performs a hidden vector evolution trajectory-based classification rather than classifying with a hidden vector at the last layer only. Our generator also adopts an ODE layer at the very beginning of its architecture to transform its initial input vector (i.e., the concatenation of a noisy vector and a condition vector in our case) onto another latent vector space suitable for the generation process. We conduct experiments with 13 datasets, including but not limited to insurance fraud detection, online news article prediction, and so on, and our presented method outperforms other state-of-the-art tabular data synthesis methods in many cases of our classification, regression, and clustering experiments. Accepted by WWW 2021 |
Databáze: | OpenAIRE |
Externí odkaz: |