Implementing an AI algorithm in the clinical setting: a case study for the accuracy paradox.
Autor: | Scaringi JA; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA., McTaggart RA; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA., Alvin MD; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA., Atalay M; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA.; Brown Radiology Human Factors Lab, Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA., Bernstein MH; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA.; Brown Radiology Human Factors Lab, Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA., Jayaraman MV; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA., Jindal G; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA., Movson JS; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA., Swenson DW; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA., Baird GL; Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA. gbaird@lifespan.org.; Brown Radiology Human Factors Lab, Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Providence, RI, USA. gbaird@lifespan.org.; Lifespan Biostatistics, Epidemiology, and Research Design, Providence, RI, USA. gbaird@lifespan.org. |
---|---|
Jazyk: | angličtina |
Zdroj: | European radiology [Eur Radiol] 2024 Dec 31. Date of Electronic Publication: 2024 Dec 31. |
DOI: | 10.1007/s00330-024-11332-z |
Abstrakt: | Objectives: We report our experience implementing an algorithm for the detection of large vessel occlusion (LVO) for suspected stroke in the emergency setting, including its performance, and offer an explanation as to why it was poorly received by radiologists. Materials and Methods: An algorithm was deployed in the emergency room at a single tertiary care hospital for the detection of LVO on CT angiography (CTA) between September 1st-27th, 2021. A retrospective analysis of the algorithm's accuracy was performed. Results: During the study period, 48 patients underwent CTA examination in the emergency department to evaluate for emergent LVO, with 2 positive cases (60.3 years ± 18.2; 32 women). The LVO algorithm demonstrated a sensitivity and specificity of 100% and 92%, respectively. While the sensitivity of the algorithm at our institution was even higher than the manufacturer's reported values, the false discovery rate was 67%, leading to the perception that the algorithm was inaccurate. In addition, the positive predictive value at our institution was 33% compared with the manufacturer's reported values of 95-98%. This disparity can be attributed to differences in disease prevalence of 4.1% at our institution compared with 45.0-62.2% from the manufacturer's reported values. Conclusion: Despite the LVO algorithm's accuracy performing as advertised, it was perceived as inaccurate due to more false positives than anticipated and was removed from clinical practice. This was likely due to a cognitive bias called the accuracy paradox. To mitigate the accuracy paradox, radiologists should be presented with metrics based on a disease prevalence similar to their practice when evaluating and utilizing artificial intelligence tools. Key Points: Question An artificial intelligence algorithm for detecting emergent LVOs was implemented in an emergency department, but it was perceived to be inaccurate. Findings Although the algorithm's accuracy was both high and as advertised, the algorithm demonstrated a high false discovery rate. Clinical relevance The misperception of the algorithm's inaccuracy was likely due to a special case of the base rate fallacy-the accuracy paradox. Equipping radiologists with an algorithm's false discovery rate based on local prevalence will ensure realistic expectations for real-world performance. Competing Interests: Compliance with ethical standards. Guarantor: The scientific guarantor of this publication is Dr. Grayson L. Baird. Conflict of interest: The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article. Statistics and biometry: One of the authors has significant statistical expertise. No complex statistical methods were necessary for this paper. Informed consent: Written informed consent was waived by the Institutional Review Board. Ethical approval: Institutional Review Board approval was obtained. Study subjects or cohorts overlap: The study cohort has not been previously reported. Methodology: Retrospective Cross-sectional study Performed at one institution (© 2024. The Author(s), under exclusive licence to European Society of Radiology.) |
Databáze: | MEDLINE |
Externí odkaz: |