Reduction of Variables for Predicting Breast Cancer Survivability Using Principal Component Analysis

Autor: Naveen Zehra Quazilbash, Shakeel Ahmed Khoja, Sharaf Hussain, Samita Bai
Rok vydání: 2015
Předmět:
Zdroj: CBMS
DOI: 10.1109/cbms.2015.62
Popis: This research uses breast cancer data from the Surveillance, Epidemiology, and End Results (SEER) dataset's (1973-2010), which contains 684394 records. It is cleaned using several data pre-processing techniques. Survivability predictions are proposed using two different methods. In the first method, 14 variables are used as suggested by Delen et al[1], and in second method 14 variables are reduced to 5 variables (Principal Components) using a statistical technique called Principal Component Analysis (PCA), which captures 98% of total variance. The results of both of the methods propose almost same level of accuracy, thereby reducing the number of variables to be taken into account for the analysis of data.
Databáze: OpenAIRE