Validation of a deep learning algorithm identifying diagnostic quality cardiac ultrasound reference views during search phase
Autor: | O Moal, E Roger, P Reant, A Dezellus, M Tavernier, B Moal, S Lafitte |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | European Heart Journal - Cardiovascular Imaging. 23 |
ISSN: | 2047-2412 2047-2404 |
DOI: | 10.1093/ehjci/jeab289.007 |
Popis: | Funding Acknowledgements Type of funding sources: Private company. Main funding source(s): DESKi Background Transthoracic echocardiography requires long and demanding training. The use of deep learning algorithms for the identification of diagnostic quality reference views opens the way for assisting low-experienced operators. However, such algorithms have only been trained and validated on retrospective data extracted from daily hospital practice. Those retrospective data are the best clips selected by experts during each exam. Thus, the performance of those algorithms has not been evaluated on complete acquisitions, including the search phase, i.e. when the operator is moving the probe to reach the optimal position. Those search phases are not recorded in clinical practice. Purpose The objective of this study was to evaluate the performance of a deep learning algorithm that identifies diagnostic quality images for 7 echocardiographic reference views on a prospective dataset including search phases. Methods A retrospective dataset containing acquisitions extracted from daily hospital practice was created. The retrospective dataset was manually annotated by experts following quality criteria that were defined for 7 main reference views: parasternal long and short axis, apical 4, 2, and 3-chamber, subcostal 4-chamber and inferior vena cava. Each frame of the acquisitions was annotated by an expert and reviewed by a second expert. A deep learning model was trained on the retrospective dataset and evaluated with 5-fold cross-validation. In a prospective dataset, for each included patient and each reference view, operators recorded the entire acquisition, including the search phase. Acquisitions were manually annotated by 2 experts following the same quality criteria. The model trained on retrospective data was evaluated on this prospective dataset against manual experts’ annotations. Additionally, 3 experts annotated a subset of the prospective dataset to evaluate the inter-observer variability. Results 481,111 frames from 1,325 patients were annotated for the retrospective dataset. On the 5-fold cross-validation, the algorithm reached an average accuracy of 89.2 ± 0.7% per frame and an average F1-score of 87.9 ± 0.9% per frame. For the prospective dataset, 70 patients were included and 143,804 frames annotated. The average accuracy was 84.8 ± 5.0% per patient, and the average F1-score was 83.1 ± 8.7% per patient. The average inter-observer agreement rates, performed on 10 patients (23,390 frames), were between 82.8 ± 4.9% and 86.3 ± 6.2%. Conclusions We propose a deep learning algorithm that automatically identifies 7 of the most common reference views with diagnostic quality while achieving performance within the range of the inter-observer variability in acquisitions including search phases. |
Databáze: | OpenAIRE |
Externí odkaz: |