Autor: |
Kindji, G. Charbel N., Rojas-Barahona, Lina Maria, Fromont, Elisa, Urvoy, Tanguy |
Rok vydání: |
2024 |
Předmět: |
|
Zdroj: |
COLING 2025 Workshop on Detecting AI Generated Content, Jan 2025, Abu dahbi, United Arab Emirates |
Druh dokumentu: |
Working Paper |
Popis: |
Detecting synthetic tabular data is essential to prevent the distribution of false or manipulated datasets that could compromise data-driven decision-making. This study explores whether synthetic tabular data can be reliably identified ''in the wild''-meaning across different generators, domains, and table formats. This challenge is unique to tabular data, where structures (such as number of columns, data types, and formats) can vary widely from one table to another. We propose three cross-table baseline detectors and four distinct evaluation protocols, each corresponding to a different level of ''wildness''. Our very preliminary results confirm that cross-table adaptation is a challenging task. |
Databáze: |
arXiv |
Externí odkaz: |
|