Edge roughness quantifies impact of physician variation on training and performance of deep learning auto-segmentation models for the esophagus

Autor: Yujie Yan, Christopher Kehayias, John He, Hugo J. W. L. Aerts, Kelly J. Fitzgerald, Benjamin H. Kann, David E. Kozono, Christian V. Guthier, Raymond H. Mak
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Scientific Reports, Vol 14, Iss 1, Pp 1-11 (2024)
Druh dokumentu: article
ISSN: 2045-2322
DOI: 10.1038/s41598-023-50382-z
Popis: Abstract Manual segmentation of tumors and organs-at-risk (OAR) in 3D imaging for radiation-therapy planning is time-consuming and subject to variation between different observers. Artificial intelligence (AI) can assist with segmentation, but challenges exist in ensuring high-quality segmentation, especially for small, variable structures, such as the esophagus. We investigated the effect of variation in segmentation quality and style of physicians for training deep-learning models for esophagus segmentation and proposed a new metric, edge roughness, for evaluating/quantifying slice-to-slice inconsistency. This study includes a real-world cohort of 394 patients who each received radiation therapy (mainly for lung cancer). Segmentation of the esophagus was performed by 8 physicians as part of routine clinical care. We evaluated manual segmentation by comparing the length and edge roughness of segmentations among physicians to analyze inconsistencies. We trained eight multiple- and individual-physician segmentation models in total, based on U-Net architectures and residual backbones. We used the volumetric Dice coefficient to measure the performance for each model. We proposed a metric, edge roughness, to quantify the shift of segmentation among adjacent slices by calculating the curvature of edges of the 2D sagittal- and coronal-view projections. The auto-segmentation model trained on multiple physicians (MD1-7) achieved the highest mean Dice of 73.7 ± 14.8%. The individual-physician model (MD7) with the highest edge roughness (mean ± SD: 0.106 ± 0.016) demonstrated significantly lower volumetric Dice for test cases compared with other individual models (MD7: 58.5 ± 15.8%, MD6: 67.1 ± 16.8%, p
Databáze: Directory of Open Access Journals
Nepřihlášeným uživatelům se plný text nezobrazuje