A Study on Distinguishing ChatGPT-Generated and Human-Written Orthopaedic Abstracts by Reviewers: Decoding the Discrepancies.

Autor: Makiev KG; Department of Orthopaedics, University General Hospital of Alexandroupolis, Democritus University of Thrace, Alexandroupoli, GRC., Asimakidou M; School of Medicine, University General Hospital of Alexandroupolis, Democritus University of Thrace, Alexandroupoli, GRC., Vasios IS; Department of Orthopaedics, University General Hospital of Alexandroupolis, Democritus University of Thrace, Alexandroupoli, GRC., Keskinis A; Department of Orthopaedics, University General Hospital of Alexandroupolis, Democritus University of Thrace, Alexandroupoli, GRC., Petkidis G; Department of Orthopaedics, University General Hospital of Alexandroupolis, Democritus University of Thrace, Alexandroupoli, GRC., Tilkeridis K; Department of Orthopaedics, University General Hospital of Alexandroupolis, Democritus University of Thrace, Alexandroupoli, GRC., Ververidis A; Department of Orthopaedics, University General Hospital of Alexandroupolis, Democritus University of Thrace, Alexandroupoli, GRC., Iliopoulos E; Department of Orthopaedics, University General Hospital of Alexandroupolis, Democritus University of Thrace, Alexandroupoli, GRC.
Jazyk: angličtina
Zdroj: Cureus [Cureus] 2023 Nov 21; Vol. 15 (11), pp. e49166. Date of Electronic Publication: 2023 Nov 21 (Print Publication: 2023).
DOI: 10.7759/cureus.49166
Abstrakt: Background: ChatGPT (OpenAI Incorporated, Mission District, San Francisco, United States) is an artificial intelligence (AI)-based language model that generates human-resembling texts. This AI-generated literary work is comprehensible and contextually relevant and it is really difficult to differentiate from human-written content. ChatGPT has risen in popularity lately and is widely utilized in scholarly manuscript drafting. The aim of this study is to identify if 1) human reviewers can differentiate between AI-generated and human-written abstracts and 2) AI detectors are currently reliable in detecting AI-generated abstracts.
Methods: Seven blinded reviewers were asked to read 21 abstracts and differentiate which were AI-generated and which were human-written. The first group consisted of three orthopaedic residents with limited research experience (OR). The second group included three orthopaedic professors with extensive research experience (OP). The seventh reviewer was a non-orthopaedic doctor and acted as a control in terms of expertise. All abstracts were scanned by a plagiarism detector program. The performance of detecting AI-generated abstracts of two different AI detectors was also analyzed. A structured interview was conducted at the end of the survey in order to evaluate the decision-making process utilized by each reviewer.
Results: The OR group managed to identify correctly 34.9% of the abstracts' authorship and the OP group 31.7%. The non-orthopaedic control identified correctly 76.2%. All AI-generated abstracts were 100% unique (0% plagiarism). The first AI detector managed to identify correctly only 9/21 (42.9%) of the abstracts' authors, whereas the second AI detector identified 14/21 (66.6%).
Conclusion: Inability to correctly identify AI-generated context poses a significant scientific risk as "false" abstracts can end up in scientific conferences or publications. Neither expertise nor research background was shown to have any meaningful impact on the predictive outcome. Focus on statistical data presentation may help the differentiation process. Further research is warranted in order to highlight which elements could help reveal an AI-generated abstract.
Competing Interests: The authors have declared that no competing interests exist.
(Copyright © 2023, Makiev et al.)
Databáze: MEDLINE