Cross-table Synthetic Tabular Data Detection

Autor: Kindji, G. Charbel N., Rojas-Barahona, Lina Maria, Fromont, Elisa, Urvoy, Tanguy
Rok vydání: 2024
Předmět:
Zdroj: COLING 2025 Workshop on Detecting AI Generated Content, Jan 2025, Abu dahbi, United Arab Emirates
Druh dokumentu: Working Paper
Popis: Detecting synthetic tabular data is essential to prevent the distribution of false or manipulated datasets that could compromise data-driven decision-making. This study explores whether synthetic tabular data can be reliably identified ''in the wild''-meaning across different generators, domains, and table formats. This challenge is unique to tabular data, where structures (such as number of columns, data types, and formats) can vary widely from one table to another. We propose three cross-table baseline detectors and four distinct evaluation protocols, each corresponding to a different level of ''wildness''. Our very preliminary results confirm that cross-table adaptation is a challenging task.
Databáze: arXiv