Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients.

Autor: Fatima, Noreen, Mento, Federico, Zanforlin, Alessandro, Smargiassi, Andrea, Torri, Elena, Perrone, Tiziano, Demi, Libertario
Předmět:
Zdroj: Journal of Ultrasound in Medicine; Apr2023, Vol. 42 Issue 4, p843-851, 9p
Abstrakt: Objectives: Lung ultrasound (LUS) has sparked significant interest during COVID‐19. LUS is based on the detection and analysis of imaging patterns. Vertical artifacts and consolidations are some of the recognized patterns in COVID‐19. However, the interrater reliability (IRR) of these findings has not been yet thoroughly investigated. The goal of this study is to assess IRR in LUS COVID‐19 data and determine how many LUS videos and operators are required to obtain a reliable result. Methods: A total of 1035 LUS videos from 59 COVID‐19 patients were included. Videos were randomly selected from a dataset of 1807 videos and scored by six human operators (HOs). The videos were also analyzed by artificial intelligence (AI) algorithms. Fleiss' kappa coefficient results are presented, evaluated at both the video and prognostic levels. Results: Findings show a stable agreement when evaluating a minimum of 500 videos. The statistical analysis illustrates that, at a video level, a Fleiss' kappa coefficient of 0.464 (95% confidence interval [CI] = 0.455–0.473) and 0.404 (95% CI = 0.396–0.412) is obtained for pairs of HOs and for AI versus HOs, respectively. At prognostic level, a Fleiss' kappa coefficient of 0.505 (95% CI = 0.448–0.562) and 0.506 (95% CI = 0.458–0.555) is obtained for pairs of HOs and for AI versus HOs, respectively. Conclusions: To examine IRR and obtain a reliable evaluation, a minimum of 500 videos are recommended. Moreover, the employed AI algorithms achieve results that are comparable with HOs. This research further provides a methodology that can be useful to benchmark future LUS studies. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index