Computer-aided diagnosis of lung cancer: the effect of training data sets on classification accuracy of lung nodules

Autor:	Ji yu Liu, Jing Gong, Bin Zheng, Xi wen Sun, Sheng dong Nie
Rok vydání:	2018
Předmět:	Male medicine.medical_specialty Lung Neoplasms Support Vector Machine 030218 nuclear medicine & medical imaging Machine Learning 03 medical and health sciences Naive Bayes classifier Imaging Three-Dimensional 0302 clinical medicine Carcinoma Non-Small-Cell Lung medicine Humans Radiology Nuclear Medicine and imaging Diagnosis Computer-Assisted Stage (cooking) Lung cancer Neoplasm Staging Retrospective Studies Multiple Pulmonary Nodules Radiological and Ultrasound Technology Receiver operating characteristic business.industry Bayes Theorem medicine.disease Linear discriminant analysis Data set ROC Curve Computer-aided diagnosis Case-Control Studies 030220 oncology & carcinogenesis Female Radiology Tomography X-Ray Computed business Algorithms
Zdroj:	Physics in Medicine & Biology. 63:035036
ISSN:	1361-6560
DOI:	10.1088/1361-6560/aaa610
Popis:	This study aims to develop a computer-aided diagnosis (CADx) scheme for classification between malignant and benign lung nodules, and also assess whether CADx performance changes in detecting nodules associated with early and advanced stage lung cancer. The study involves 243 biopsy-confirmed pulmonary nodules. Among them, 76 are benign, 81 are stage I and 86 are stage III malignant nodules. The cases are separated into three data sets involving: (1) all nodules, (2) benign and stage I malignant nodules, and (3) benign and stage III malignant nodules. A CADx scheme is applied to segment lung nodules depicted on computed tomography images and we initially computed 66 3D image features. Then, three machine learning models namely, a support vector machine, naïve Bayes classifier and linear discriminant analysis, are separately trained and tested by using three data sets and a leave-one-case-out cross-validation method embedded with a Relief-F feature selection algorithm. When separately using three data sets to train and test three classifiers, the average areas under receiver operating characteristic curves (AUC) are 0.94, 0.90 and 0.99, respectively. When using the classifiers trained using data sets with all nodules, average AUC values are 0.88 and 0.99 for detecting early and advanced stage nodules, respectively. AUC values computed from three classifiers trained using the same data set are consistent without statistically significant difference (p 0.05). This study demonstrates (1) the feasibility of applying a CADx scheme to accurately distinguish between benign and malignant lung nodules, and (2) a positive trend between CADx performance and cancer progression stage. Thus, in order to increase CADx performance in detecting subtle and early cancer, training data sets should include more diverse early stage cancer cases.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9a74d548abd18bbb69d0356639c85d70 https://doi.org/10.1088/1361-6560/aaa610 Zobrazit plný text záznamu