Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.
Autor: | Majdik ZP; Department of Communication, North Dakota State University, Fargo, ND, United States., Graham SS; Department of Rhetoric & Writing, The University of Texas at Austin, Austin, TX, United States., Shiva Edward JC; Department of Rhetoric & Writing, The University of Texas at Austin, Austin, TX, United States., Rodriguez SN; Department of Neurology, The Dell Medical School, The University of Texas at Austin, Austin, TX, United States., Karnes MS; Department of Rhetoric & Writing, University of Arkansas Little Rock, Little Rock, AR, United States., Jensen JT; Department of Rhetoric & Writing, The University of Texas at Austin, Austin, TX, United States., Barbour JB; Department of Communication, The University of Illinois at Urbana-Champaign, Urbana, IL, United States., Rousseau JF; Statistical Planning and Analysis Section, Department of Neurology, The University of Texas Southwestern Medical Center, Dallas, TX, United States.; Peter O'Donnell Jr. Brain Institute, The University of Texas Southwestern Medical Center, Dallas, TX, United States. |
---|---|
Jazyk: | angličtina |
Zdroj: | JMIR AI [JMIR AI] 2024 May 16; Vol. 3, pp. e52095. Date of Electronic Publication: 2024 May 16. |
DOI: | 10.2196/52095 |
Abstrakt: | Background: Large language models (LLMs) have the potential to support promising new applications in health informatics. However, practical data on sample size considerations for fine-tuning LLMs to perform specific tasks in biomedical and health policy contexts are lacking. Objective: This study aims to evaluate sample size and sample selection techniques for fine-tuning LLMs to support improved named entity recognition (NER) for a custom data set of conflicts of interest disclosure statements. Methods: A random sample of 200 disclosure statements was prepared for annotation. All "PERSON" and "ORG" entities were identified by each of the 2 raters, and once appropriate agreement was established, the annotators independently annotated an additional 290 disclosure statements. From the 490 annotated documents, 2500 stratified random samples in different size ranges were drawn. The 2500 training set subsamples were used to fine-tune a selection of language models across 2 model architectures (Bidirectional Encoder Representations from Transformers [BERT] and Generative Pre-trained Transformer [GPT]) for improved NER, and multiple regression was used to assess the relationship between sample size (sentences), entity density (entities per sentence [EPS]), and trained model performance (F Results: Fine-tuned models ranged in topline NER performance from F Conclusions: Relatively modest sample sizes can be used to fine-tune LLMs for NER tasks applied to biomedical text, and training data entity density should representatively approximate entity density in production data. Training data quality and a model architecture's intended use (text generation vs text processing or classification) may be as, or more, important as training data volume and model parameter size. (©Zoltan P Majdik, S Scott Graham, Jade C Shiva Edward, Sabrina N Rodriguez, Martha S Karnes, Jared T Jensen, Joshua B Barbour, Justin F Rousseau. Originally published in JMIR AI (https://ai.jmir.org), 16.05.2024.) |
Databáze: | MEDLINE |
Externí odkaz: |