Proof-of-concept study of a small language model chatbot for breast cancer decision support - a transparent, source-controlled, explainable and data-secure approach.

Autor: Griewing S; Institute for Digital Medicine, University Hospital Giessen and Marburg, Philipps-University Marburg, Marburg, Germany. s.griewing@uni-marburg.de.; Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Palo Alto, CA, USA. s.griewing@uni-marburg.de.; Marburg Gynecological Cancer Center, Giessen and Marburg University Hospital, Philipps-University Marburg, Marburg, Germany. s.griewing@uni-marburg.de.; Commission Digital Medicine, German Society for Gynecology and Obstetrics (DGGG), Berlin, Germany. s.griewing@uni-marburg.de., Lechner F; Institute for Digital Medicine, University Hospital Giessen and Marburg, Philipps-University Marburg, Marburg, Germany.; Institute for Artificial Intelligence in Medicine, University Hospital Giessen and Marburg, Philipps-University Marburg, Marburg, Germany., Gremke N; Marburg Gynecological Cancer Center, Giessen and Marburg University Hospital, Philipps-University Marburg, Marburg, Germany., Lukac S; Department of Obstetrics and Gynecology, University Hospital Ulm, University of Ulm, Ulm, Germany.; Commission Digital Medicine, German Society for Gynecology and Obstetrics (DGGG), Berlin, Germany., Janni W; Department of Obstetrics and Gynecology, University Hospital Ulm, University of Ulm, Ulm, Germany., Wallwiener M; Halle Gynecological Cancer Center, Halle University Hospital, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany.; Commission Digital Medicine, German Society for Gynecology and Obstetrics (DGGG), Berlin, Germany., Wagner U; Marburg Gynecological Cancer Center, Giessen and Marburg University Hospital, Philipps-University Marburg, Marburg, Germany.; Commission Digital Medicine, German Society for Gynecology and Obstetrics (DGGG), Berlin, Germany., Hirsch M; Institute for Artificial Intelligence in Medicine, University Hospital Giessen and Marburg, Philipps-University Marburg, Marburg, Germany., Kuhn S; Institute for Digital Medicine, University Hospital Giessen and Marburg, Philipps-University Marburg, Marburg, Germany.
Jazyk: angličtina
Zdroj: Journal of cancer research and clinical oncology [J Cancer Res Clin Oncol] 2024 Oct 09; Vol. 150 (10), pp. 451. Date of Electronic Publication: 2024 Oct 09.
DOI: 10.1007/s00432-024-05964-3
Abstrakt: Purpose: Large language models (LLM) show potential for decision support in breast cancer care. Their use in clinical care is currently prohibited by lack of control over sources used for decision-making, explainability of the decision-making process and health data security issues. Recent development of Small Language Models (SLM) is discussed to address these challenges. This preclinical proof-of-concept study tailors an open-source SLM to the German breast cancer guideline (BC-SLM) to evaluate initial clinical accuracy and technical functionality in a preclinical simulation.
Methods: A multidisciplinary tumor board (MTB) is used as the gold-standard to assess the initial clinical accuracy in terms of concordance of the BC-SLM with MTB and comparing it to two publicly available LLM, ChatGPT3.5 and 4. The study includes 20 fictional patient profiles and recommendations for 5 treatment modalities, resulting in 100 binary treatment recommendations (recommended or not recommended). Statistical evaluation includes concordance with MTB in % including Cohen's Kappa statistic (κ). Technical functionality is assessed qualitatively in terms of local hosting, adherence to the guideline and information retrieval.
Results: The overall concordance amounts to 86% for BC-SLM (κ = 0.721, p < 0.001), 90% for ChatGPT4 (κ = 0.820, p < 0.001) and 83% for ChatGPT3.5 (κ = 0.661, p < 0.001). Specific concordance for each treatment modality ranges from 65 to 100% for BC-SLM, 85-100% for ChatGPT4, and 55-95% for ChatGPT3.5. The BC-SLM is locally functional, adheres to the standards of the German breast cancer guideline and provides referenced sections for its decision-making.
Conclusion: The tailored BC-SLM shows initial clinical accuracy and technical functionality, with concordance to the MTB that is comparable to publicly-available LLMs like ChatGPT4 and 3.5. This serves as a proof-of-concept for adapting a SLM to an oncological disease and its guideline to address prevailing issues with LLM by ensuring decision transparency, explainability, source control, and data security, which represents a necessary step towards clinical validation and safe use of language models in clinical oncology.
(© 2024. The Author(s).)
Databáze: MEDLINE