F-test feature selection in Stacking ensemble model for breast cancer prediction

Autor: Dhanya R, Sai Sindhu Akula, Madhumathi Sivakumar, Jyothisha J. Nair, Irene Rose Paul
Rok vydání: 2020
Předmět:
Zdroj: Procedia Computer Science. 171:1561-1570
ISSN: 1877-0509
DOI: 10.1016/j.procs.2020.04.167
Popis: Cancer data sets contains many details of patient information, out of which only a few attributes contribute in predicting the accurate stage of cancer. Certain attributes of the entire data set play a major role in deciding the type of cancer i.e. whether benign or malignant hence feature selection techniques are useful in such scenarios for retaining the relevant feature set. Moreover, in order to achieve our goal of predicting the accurate stage of cancer, we need an appropriate model which generally results in higher accuracy and ensemble model proves to be the best model for such scenarios. In this study, we are using the existing ensemble techniques along with a combination of supervised machine learning algorithms to develop a new model for breast cancer prediction. We are also using feature selection techniques to enhance the performance of the ensemble model. For this purpose, machine learning algorithms like Support Vector Machines, Naive Bayes, K-Nearest Neighbors, Logistics Regression and feature selection techniques like Variance threshold and f-test have been taken into consideration. To achieve higher accuracy for the ensemble model, bagging, boosting and stacking techniques are used.
Databáze: OpenAIRE