Enhancing Diagnostic Support for Chiari Malformation and Syringomyelia: A Comparative Study of Contextualized ChatGPT Models.
Autor: | Brown EDL; Department of Neurologic Surgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, USA. Electronic address: ebrown35@northwell.edu., Ward M; Department of Neurologic Surgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, USA., Maity A; Department of Neurologic Surgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, USA., Mittler MA; Department of Neurologic Surgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, USA., Larry Lo SF; Department of Neurologic Surgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, USA., D'Amico RS; Department of Neurologic Surgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, USA. |
---|---|
Jazyk: | angličtina |
Zdroj: | World neurosurgery [World Neurosurg] 2024 Sep; Vol. 189, pp. e86-e107. Date of Electronic Publication: 2024 Jun 01. |
DOI: | 10.1016/j.wneu.2024.05.172 |
Abstrakt: | Objectives: The rapidly increasing adoption of large language models in medicine has drawn attention to potential applications within the field of neurosurgery. This study evaluates the effects of various contextualization methods on ChatGPT's ability to provide expert-consensus aligned recommendations on the diagnosis and management of Chiari Malformation and Syringomyelia. Methods: Native GPT4 and GPT4 models contextualized using various strategies were asked questions revised from the 2022 Chiari and Syringomyelia Consortium International Consensus Document. ChatGPT-provided responses were then compared to consensus statements using reviewer assessments of 1) responding to the prompt, 2) agreement of ChatGPT response with consensus statements, 3) recommendation to consult with a medical professional, and 4) presence of supplementary information. Flesch-Kincaid, SMOG, word count, and Gunning-Fog readability scores were calculated for each model using the quanteda package in R. Results: Relative to GPT4, all contextualized GPTs demonstrated increased agreement with consensus statements. PDF+Prompting and Prompting models provided the most elevated agreement scores of 19 of 24 and 23 of 24, respectively, versus 9 of 24 for GPT4 (p=.021, p=.001). A trend toward improved readability was observed when comparing contextualized models at large to ChatGPT4, with significant decreases in average word count (180.7 vs 382.3, p<.001) and Flesch-Kincaid Reading Ease score (11.7 vs 17.2, p=.033). Conclusions: The enhanced performance observed in response to ChatGPT4 contextualization suggests broader applications of large language models in neurosurgery than what the current literature indicates. This study provides proof of concept for the use of contextualized GPT models in neurosurgical contexts and showcases the easy accessibility of improved model performance. (Copyright © 2024 Elsevier Inc. All rights reserved.) |
Databáze: | MEDLINE |
Externí odkaz: |