Evaluating the effectiveness of artificial intelligence-based tools in detecting and understanding sleep health misinformation: Comparative analysis using Google Bard and OpenAI ChatGPT-4.
Autor: | Garbarino S; Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics and Maternal, Child Sciences (DINOGMI), University of Genoa, Genoa, Italy.; Post-Graduate School of Occupational Health, Università Cattolica del Sacro Cuore, Rome, Italy., Bragazzi NL; Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics and Maternal, Child Sciences (DINOGMI), University of Genoa, Genoa, Italy.; Laboratory for Industrial and Applied Mathematics (LIAM), Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada.; Human Nutrition Unit (HNU), Department of Food and Drugs, University of Parma, Parma, Italy. |
---|---|
Jazyk: | angličtina |
Zdroj: | Journal of sleep research [J Sleep Res] 2024 Apr 05, pp. e14210. Date of Electronic Publication: 2024 Apr 05. |
DOI: | 10.1111/jsr.14210 |
Abstrakt: | This study evaluates the performance of two major artificial intelligence-based tools (ChatGPT-4 and Google Bard) in debunking sleep-related myths. More in detail, the present research assessed 20 sleep misconceptions using a 5-point Likert scale for falseness and public health significance, comparing responses of artificial intelligence tools with expert opinions. The results indicated that Google Bard correctly identified 19 out of 20 statements as false (95.0% accuracy), not differing from ChatGPT-4 (85.0% accuracy, Fisher's exact test p = 0.615). Google Bard's ratings of the falseness of the sleep misconceptions averaged 4.25 ± 0.70, showing a moderately negative skewness (-0.42) and kurtosis (-0.83), and suggesting a distribution with fewer extreme values compared with ChatGPT-4. In assessing public health significance, Google Bard's mean score was 2.4 ± 0.80, with skewness and kurtosis of 0.36 and -0.07, respectively, indicating a more normal distribution compared with ChatGPT-4. The inter-rater agreement between Google Bard and sleep experts had an intra-class correlation coefficient of 0.58 for falseness and 0.69 for public health significance, showing moderate alignment (p = 0.065 and p = 0.014, respectively). Text-mining analysis revealed Google Bard's focus on practical advice, while ChatGPT-4 concentrated on theoretical aspects of sleep. The readability analysis suggested Google Bard's responses were more accessible, aligning with 8th-grade level material, versus ChatGPT-4's 12th-grade level complexity. The study demonstrates the potential of artificial intelligence in public health education, especially in sleep health, and underscores the importance of accurate, reliable artificial intelligence-generated information, calling for further collaboration between artificial intelligence developers, sleep health professionals and educators to enhance the effectiveness of sleep health promotion. (© 2024 The Authors. Journal of Sleep Research published by John Wiley & Sons Ltd on behalf of European Sleep Research Society.) |
Databáze: | MEDLINE |
Externí odkaz: |