Improving Detection of ChatGPT-Generated Fake Science Using Real Publication Text: Introducing xFakeBibs a Supervised-Learning Network Algorithm

Autor:	Ahmed Abdeen Hamed, Xindong Wu
Rok vydání:	2023
DOI:	10.21203/rs.3.rs-2851222/v1
Popis:	ChatGPT is becoming a new reality. In this paper, we show how to distinguish ChatGPT-generated publications from counterparts produced by scientists. Using a newly designed supervised Machine Learning algorithm, we demonstrate how to detect machine-generated publications from ones produced by scientists. The algorithm was trained using 100 real publications, calibrated by 10-fold of real publications. When comparing the training with calibration, we found that the similarities fluctuated between (19%-21%) of bigram overlaps. The calibrating folds contributed (51%-70%) of new bigrams, while ChatGPT contributed only 23% (> 50% of any of the other 10 calibrating folds). When classifying the individual articles, the xFakeBibs algorithm predicted 98/100 publications as fake, while 2 articles failed the test and were classified as real publications. We introduced an algorithmic approach that detected the ChatGPT-generated articles with a high degree of accuracy. However, it remains challenging to detect all fake records. This work is indeed a step in the right direction to counter fake science and misinformation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::0fd0731d2d466d386a4e8571a022c90e https://doi.org/10.21203/rs.3.rs-2851222/v1 Zobrazit plný text záznamu