Optimal number of strong labels for curriculum learning with convolutional neural network to classify pulmonary abnormalities in chest radiographs

Autor: Namkug Kim, Joon Beom Seo, Beomhee Park, Kyung Hee Lee, Sang Min Lee, Yongwon Cho
Rok vydání: 2021
Předmět:
Zdroj: Computers in biology and medicine. 136
ISSN: 1879-0534
Popis: Background and objective It is important to alleviate annotation efforts and costs by efficiently training on medical images. We performed a stress test on several strong labels for curriculum learning with a convolutional neural network to differentiate normal and five types of pulmonary abnormalities in chest radiograph images. Methods The numbers of CXR images of healthy subjects and patients, acquired at Asan Medical Center (AMC), were 6069 and 3465, respectively. The numbers of CXR images of patients with nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax were 944, 550, 280, 1360, and 331, respectively. The AMC dataset was split into training, tuning, and test, with a ratio of 7:1:2. All lesions were strongly labeled by thoracic expert radiologists, with confirmation of the corresponding CT. For curriculum learning, normal and abnormal patches (N = 26658) were randomly extracted around the normal lung and strongly labeled abnormal lesions, respectively. In addition, 1%, 5%, 20%, 50%, and 100% of strong labels were used to determine an optimal number for them. Each patch dataset was trained with the ResNet-50 architecture, and all CXRs with weak labels were used for fine-tuning them in a transfer-learning manner. A dataset acquired from the Seoul National University Bundang Hospital (SNUBH) was used for external validation. Results The detection accuracies of the 1%, 5%, 20%, 50%, and 100% datasets were 90.51, 92.15, 93.90, 94.54, and 95.39, respectively, in the AMC dataset and 90.01, 90.14, 90.97, 91.92, and 93.00 in the SNUBH dataset. Conclusions Our results showed that curriculum learning with over 20% sampling rate for strong labels are sufficient to train a model with relatively high performance, which can be easily and efficiently developed in an actual clinical setting.
Databáze: OpenAIRE