Using Feature Selection Based Support Vector Machines and Bayes Classification Algorithm to Improve the Predictive Performance of Radiation-Induced Pneumonitis Complication in Breast Cancer

Autor: TSENG, CHIN-DAR, 曾慶達
Rok vydání: 2019
Druh dokumentu: 學位論文 ; thesis
Popis: 107
Purpose: This thesis presents our efforts in the design and implementation of a framework for the prediction of radiation pneumonitis (RP) in radiation therapy for breast cancer. Two technical issues, namely feature selection and machine learning for modeling and prediction, are involved in the proposed framework. This study examines two alternatives for feature selection. They are Least Absolute Shrinkage and Selection Operator (LASSO) and Markov feature selection (Markov-FS). Three approaches for modeling and prediction are considered in our study, including Support Vector Machine (SVM), Bayesian networks (Bayes), and Markov. The findings are of referential value to the decision-making in radiation therapy. It is beneficial for the improvement of radiation prescription and the reduction of chance of complications. Materials and methods: The patient data were assessed on 113 patients presenting with RP using radiotherapy of volumetric modulated arc therapy (VMAT). Excluding the outliers with more than three standard deviations, the original sample stood at 106. A total of 2120 experimental samples were generated using the bootstrap method. The total number of experimental samples combined with the original samples was 2226. The two feature-selection techniques, LASSO and Markov-FS, were used to determine the important predictors that affect RP. Using these two methods, different combinations of factors were placed into classification prediction models to evaluate the classification efficiency for complications. Four combinations of factors were found: factors-all (F-all), factors-dose (F-dose), factors-LASSO (F-LASSO), and factors-Markov-feature selection (F-Mar-FS). The RP prediction model was evaluated with three algorithms, SVM, Bayes, and Markov, for the four-factor combinations. Finally, the area under the receiver-operating characteristic curve (AUC) and accuracy (ACC) was used to evaluate the performance of each model. The best classification prediction model was selected with the comparisons. Results : LASSO selected three factors: age, IV50, and N stage. Markov-FS selected seven factors: BMI, age, IV40, N stage, IV5, IV13, and IV50. The combination of SVM and F-all has the best AUC evaluation at a value of 0.922. However, the performance of F-LASSO and F-Mar-FS with the three algorithms stood between 0.824 and 0.893, indicating the selected factor combinations also have good AUC performance. Since the number of F-Mar-FS predictors is lesser than that of F-all, over-fitting can be effectively avoided. In terms of ACC performance, the value of F-all with SVM was found to be 0.852, which is the best in the lot. F-LASSO and F-Mar-FS performance values were found to reach 0.775 or above. In particular, the performance of Markov with F-Mar-FS reached 0.832. This result also supports that the selected factors also have good ACC performance. Conclusions : The comparison results of this study show that the combination of F-Mar-FS and SVM is the best for an accurate prediction of RP and can also show important factors. This combination can be very helpful in reducing the incidence of complications when breast cancer patients are undergoing radiation therapy.
Databáze: Networked Digital Library of Theses & Dissertations