Almanac: Retrieval-Augmented Language Models for Clinical Medicine.
Autor: | Zakka C; Department of Cardiothoracic Surgery, Stanford Medicine., Chaurasia A; Department of Cardiothoracic Surgery, Stanford Medicine.; Department of Computer Science, Stanford University., Shad R; Division of Cardiovascular Surgery, Penn Medicine., Dalal AR; Department of Cardiothoracic Surgery, Stanford Medicine., Kim JL; Department of Cardiothoracic Surgery, Stanford Medicine., Moor M; Department of Computer Science, Stanford University., Alexander K; Division of Cardiovascular Medicine, Stanford Medicine., Ashley E; Division of Cardiovascular Medicine, Stanford Medicine., Boyd J; Department of Cardiothoracic Surgery, Stanford Medicine., Boyd K; Department of Pediatrics, Stanford Medicine., Hirsch K; Department of Neurology, Stanford Medicine.s., Langlotz C; Department of Radiology and Biomedical Informatics, Stanford Medicine., Nelson J; Division of Infectious Diseases, Stanford Medicine., Hiesinger W; Department of Cardiothoracic Surgery, Stanford Medicine. |
---|---|
Jazyk: | angličtina |
Zdroj: | Research square [Res Sq] 2023 May 02. Date of Electronic Publication: 2023 May 02. |
DOI: | 10.21203/rs.3.rs-2883198/v1 |
Abstrakt: | Large-language models have recently demonstrated impressive zero-shot capabilities in a variety of natural language tasks such as summarization, dialogue generation, and question-answering. Despite many promising applications in clinical medicine, adoption of these models in real-world settings has been largely limited by their tendency to generate incorrect and sometimes even toxic statements. In this study, we develop Almanac, a large language model framework augmented with retrieval capabilities for medical guideline and treatment recommendations. Performance on a novel dataset of clinical scenarios ( n= 130) evaluated by a panel of 5 board-certified and resident physicians demonstrates significant increases in factuality (mean of 18% at p-value < 0.05) across all specialties, with improvements in completeness and safety. Our results demonstrate the potential for large language models to be effective tools in the clinical decision-making process, while also emphasizing the importance of careful testing and deployment to mitigate their shortcomings. Competing Interests: Competing interests The authors declare no competing interests. |
Databáze: | MEDLINE |
Externí odkaz: |