Popis: |
Auto-segmentation of medical images is critical to boost precision radiology and radiation oncology efficiency, thereby improving medical quality for both health care practitioners and patients. An appropriate metric to evaluate auto-segmentation results is one of the significant tools necessary for building an effective, robust, and practical auto-segmentation technique. However, by comparing the predicted segmentation with the ground truth, currently widely-used metrics usually focus on the overlapping area (Dice Coefficient) or the most severe shifting of the boundary (Hausdorff Distance), which seem inconsistent with human reader behaviors. Human readers usually verify and correct auto-segmentation contours and then apply the modified segmentation masks to guide clinical application in diagnosis or treatment. A metric called Mendability Index (MI) is proposed to better estimate the effort required for manually editing the auto-segmentations of objects of interest in medical images so that the segmentations become acceptable for the application at hand. Considering different human behaviors for different errors, MI classifies auto-segmented errors into three types with different quantitative behaviors. The fluctuation of human subjective delineation is also considered in MI. 505 3D computed tomography (CT) auto-segmentations consisting of 6 objects from 3 institutions with the corresponding ground truth and the recorded manual mending time needed by experts are used to validate the performance of the proposed MI. The correlation between the time for editing with the segmentation metrics demonstrates that MI is generally more suitable for indicating mending efforts than Dice Coefficient or Hausdorff Distance, suggesting that MI may be an effective metric to quantify the clinical value of auto-segmentations. |