Deep learning for contour quality assurance for RTOG 0933: In-silico evaluation.

Autor: Porter EM; Department of Medical Physics, Wayne State University, Detroit, MI, United States. Electronic address: evan.porter@ucsf.edu., Vu C; Department of Radiation Oncology, Corewell Health-East, Royal Oak, MI, United States., Sala IM; Department of Medical Physics, Wayne State University, Detroit, MI, United States; Department of Radiation Oncology, Corewell Health-East, Royal Oak, MI, United States., Guerrero T; Department of Radiation Oncology, Corewell Health-East, Royal Oak, MI, United States., Siddiqui ZA; Department of Radiation Oncology, Corewell Health-East, Royal Oak, MI, United States. Electronic address: zaid.siddiqui@bcm.edu.
Jazyk: angličtina
Zdroj: Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology [Radiother Oncol] 2024 Dec; Vol. 201, pp. 110519. Date of Electronic Publication: 2024 Aug 31.
DOI: 10.1016/j.radonc.2024.110519
Abstrakt: Purpose: To validate a CT-based deep learning (DL) hippocampal segmentation model trained on a single-institutional dataset and explore its utility for multi-institutional contour quality assurance (QA).
Methods: A DL model was trained to contour hippocampi from a dataset generated by an institutional observer (IO) contouring on brain MRIs from a single-institution cohort. The model was then evaluated on the RTOG 0933 dataset by comparing the treating physician (TP) contours to blinded IO and DL contours using Dice and Haussdorf distance (HD) agreement metrics as well as evaluating differences in dose to hippocampi when TP vs. IO vs. DL contours are used for planning. The specificity and sensitivity of the DL model to capture planning discrepancies was quantified using criteria of HD > 7 mm and Dmax hippocampi > 17 Gy.
Results: The DL model showed greater agreement with IO contours compared to TP contours (DL:IO L/R Dice 74%/73%, HD 4.86/4.74; DL:TP L/R Dice 62%/65%, HD 7.23/6.94, all p < 0.001). Thirty percent of contours and 53 % of dose plans failed QA. The DL model achieved an AUC L/R 0.80/0.79 on the contour QA task via Haussdorff comparison and AUC of 0.91 via Dmax comparison. The false negative rate was 17.2%/20.5% (contours) and 5.8% (dose). False negative cases tended to demonstrate a higher DL:IO Dice agreement (L/R p = 0.42/0.03) and better qualitative visual agreement compared with true positive cases.
Conclusion: Our study demonstrates the feasibility of using a single-institutional DL model to perform contour QA on a multi-institutional trial for the task of hippocampal segmentation.
Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2024 Elsevier B.V. All rights reserved.)
Databáze: MEDLINE