Popis: |
ChatGPT is becoming a new reality. In this paper, we show how to distinguish ChatGPT-generated publications from counterparts produced by scientists. Using a newly designed supervised Machine Learning algorithm, we demonstrate how to detect machine-generated publications from ones produced by scientists. The algorithm was trained using 100 real publications, calibrated by 10-fold of real publications. When comparing the training with calibration, we found that the similarities fluctuated between (19%-21%) of bigram overlaps. The calibrating folds contributed (51%-70%) of new bigrams, while ChatGPT contributed only 23% (> 50% of any of the other 10 calibrating folds). When classifying the individual articles, the xFakeBibs algorithm predicted 98/100 publications as fake, while 2 articles failed the test and were classified as real publications. We introduced an algorithmic approach that detected the ChatGPT-generated articles with a high degree of accuracy. However, it remains challenging to detect all fake records. This work is indeed a step in the right direction to counter fake science and misinformation. |