Literature search sandbox : a large language model that generates search queries for systematic reviews.

Autor: Adam GP; Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, RI 02903, United States., DeYoung J; Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States., Paul A; Department of Biostatistics, Brown University School of Public Health, Providence, RI 02903, United States., Saldanha IJ; Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, RI 02903, United States.; Center for Clinical Trials and Evidence Synthesis, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States., Balk EM; Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, RI 02903, United States., Trikalinos TA; Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, RI 02903, United States.; Department of Biostatistics, Brown University School of Public Health, Providence, RI 02903, United States., Wallace BC; Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States.
Jazyk: angličtina
Zdroj: JAMIA open [JAMIA Open] 2024 Sep 25; Vol. 7 (3), pp. ooae098. Date of Electronic Publication: 2024 Sep 25 (Print Publication: 2024).
DOI: 10.1093/jamiaopen/ooae098
Abstrakt: Objectives: Development of search queries for systematic reviews (SRs) is time-consuming. In this work, we capitalize on recent advances in large language models (LLMs) and a relatively large dataset of natural language descriptions of reviews and corresponding Boolean searches to generate Boolean search queries from SR titles and key questions.
Materials and Methods: We curated a training dataset of 10 346 SR search queries registered in PROSPERO. We used this dataset to fine-tune a set of models to generate search queries based on Mistral-Instruct-7b. We evaluated the models quantitatively using an evaluation dataset of 57 SRs and qualitatively through semi-structured interviews with 8 experienced medical librarians.
Results: The model-generated search queries had median sensitivity of 85% (interquartile range [IQR] 40%-100%) and number needed to read of 1206 citations (IQR 205-5810). The interviews suggested that the models lack both the necessary sensitivity and precision to be used without scrutiny but could be useful for topic scoping or as initial queries to be refined.
Discussion: Future research should focus on improving the dataset with more high-quality search queries, assessing whether fine-tuning the model on other fields, such as the population and intervention, improves performance, and exploring the addition of interactivity to the interface.
Conclusions: The datasets developed for this project can be used to train and evaluate LLMs that map review descriptions to Boolean search queries. The models cannot replace thoughtful search query design but may be useful in providing suggestions for key words and the framework for the query.
Competing Interests: The authors have no competing interests to declare.
(© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
Databáze: MEDLINE