Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare

Autor: Barai, P., Leroy, G., Bisht, P., Rothman, J. M., Lee, S., Andrews, J., Rice, S. A., Ahmed, A.
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19 percent compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.
Comment: Published in AMIA Summit, Boston, 2024. https://knowledge.amia.org/Info2024/pdf/Info2024a022/Info2024fl021
Databáze: arXiv