Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study.

Autor:	Zack T; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA., Lehman E; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA., Suzgun M; Department of Computer Science, Stanford University, Stanford, CA, USA; Stanford Law School, Stanford University, Stanford, CA, USA., Rodriguez JA; Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA., Celi LA; Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA; Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA., Gichoya J; Department of Radiology, Emory University, Atlanta, GA, USA., Jurafsky D; Department of Computer Science, Stanford University, Stanford, CA, USA; Department of Linguistics, Stanford University, Stanford, CA, USA., Szolovits P; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA., Bates DW; Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Health Policy and Management, Harvard T H Chan School of Public Health, Boston, MA, USA., Abdulnour RE; Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA., Butte AJ; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Center for Data-Driven Insights and Innovation, University of California, Office of the President, Oakland, CA, USA., Alsentzer E; Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA. Electronic address: ealsentzer@bwh.harvard.edu.
Jazyk:	angličtina
Zdroj:	The Lancet. Digital health [Lancet Digit Health] 2024 Jan; Vol. 6 (1), pp. e12-e22.
DOI:	10.1016/S2589-7500(23)00225-X
Abstrakt:	Background: Large language models (LLMs) such as GPT-4 hold great promise as transformative tools in health care, ranging from automating administrative tasks to augmenting clinical decision making. However, these models also pose a danger of perpetuating biases and delivering incorrect medical diagnoses, which can have a direct, harmful impact on medical care. We aimed to assess whether GPT-4 encodes racial and gender biases that impact its use in health care. Methods: Using the Azure OpenAI application interface, this model evaluation study tested whether GPT-4 encodes racial and gender biases and examined the impact of such biases on four potential applications of LLMs in the clinical domain-namely, medical education, diagnostic reasoning, clinical plan generation, and subjective patient assessment. We conducted experiments with prompts designed to resemble typical use of GPT-4 within clinical and medical education applications. We used clinical vignettes from NEJM Healer and from published research on implicit bias in health care. GPT-4 estimates of the demographic distribution of medical conditions were compared with true US prevalence estimates. Differential diagnosis and treatment planning were evaluated across demographic groups using standard statistical tests for significance between groups. Findings: We found that GPT-4 did not appropriately model the demographic diversity of medical conditions, consistently producing clinical vignettes that stereotype demographic presentations. The differential diagnoses created by GPT-4 for standardised clinical vignettes were more likely to include diagnoses that stereotype certain races, ethnicities, and genders. Assessment and plans created by the model showed significant association between demographic attributes and recommendations for more expensive procedures as well as differences in patient perception. Interpretation: Our findings highlight the urgent need for comprehensive and transparent bias assessments of LLM tools such as GPT-4 for intended use cases before they are integrated into clinical care. We discuss the potential sources of these biases and potential mitigation strategies before clinical implementation. Funding: Priscilla Chan and Mark Zuckerberg. Competing Interests: Declaration of interests TZ reports no external financial interests; he works in an unpaid role as a clinical consultant with Xyla. EL reports personal fees and equity from Xyla. MS reports personal fees from Xyla and serves as an intern at Microsoft Research. LAC reports travel support from Australia New Zealand College of Intensive Care Medicine, cloud credits from Oracle, Amazon, and Google, and a role as Editor-in-Chief of PLOS Digital Health. JG reports support from the US National Science Foundation (grant #1928481), Radiological Society of North America (grant #EIHD2204), National Institutes of Health (grants 75N92020C00008 and 75N920), AIM-AHEAD, DeepLook, Clarity consortium, and GE Edison; received honoraria from the National Bureau of Economic Research; and has leadership roles with SIIM, HL7, and the ACR Advisory Committee. R-EEA is an employee of Massachusetts Medical Society, which owns NEJM Healer (NEJM Healer cases were used in the study). DWB reports grants and personal fees from EarlySense; personal fees from CDI Negev; equity from ValeraHealth, Clew, MDClone, and Guided Clinical Solutions; personal fees and equity from AESOP and Feelbetter; and grants from IBM Watson Health, outside the submitted work. DWB also has a patent pending (PHC-028564US PCT) on intraoperative clinical decision support. AJB is a cofounder and consultant to Personalis and NuMedii; consultant to Mango Tree Corporation and in the recent past, to Samsung, 10x Genomics, Helix, Pathway Genomics, and Verinata (Illumina); has served on paid advisory panels or boards for Geisinger Health, Regenstrief Institute, Gerson Lehman Group, AlphaSights, Covance, Novartis, Genentech, Merck, and Roche; is a shareholder in Personalis and NuMedii; is a minor shareholder in Apple, Meta (Facebook), Alphabet (Google), Microsoft, Amazon, Snap, 10x Genomics, Illumina, Regeneron, Sanofi, Pfizer, Royalty Pharma, Moderna, Sutro, Doximity, BioNtech, Invitae, Pacific Biosciences, Editas Medicine, Nuna Health, Assay Depot, Vet24seven, and several other non-health related companies and mutual funds; and has received honoraria and travel reimbursement for invited talks from Johnson & Johnson, Roche, Genentech, Pfizer, Merck, Lilly, Takeda, Varian, Mars, Siemens, Optum, Abbott, Celgene, AstraZeneca, AbbVie, Westat, and many academic institutions, medical or disease specific foundations and associations, and health systems. AJB also receives royalty payments through Stanford University for several patents and other disclosures licensed to NuMedii and Personalis. AJB's research has been funded by the National Institutes of Health, Peraton (as the prime on a National Institutes of Health contract), Genentech, Johnson & Johnson, US Food and Drug Administration, Robert Wood Johnson Foundation, Leon Lowenstein Foundation, Intervalien Foundation, Priscilla Chan and Mark Zuckerberg, the Barbara and Gerson Bakar Foundation, and in the recent past, the March of Dimes, Juvenile Diabetes Research Foundation, California Governor's Office of Planning and Research, California Institute for Regenerative Medicine, L’Oreal, and Progenity. EA reports personal fees from Canopy Innovations, Fourier Health, and Xyla; and grants from Microsoft Research. None of these entities had any role in the design, execution, evaluation, or writing of this manuscript. All other authors declare no competing interests. (Copyright © 2024 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license. Published by Elsevier Ltd.. All rights reserved.)
Databáze:	MEDLINE
Externí odkaz:	Zobrazit plný text záznamu