F-test feature selection in Stacking ensemble model for breast cancer prediction
Autor: | Dhanya R, Sai Sindhu Akula, Madhumathi Sivakumar, Jyothisha J. Nair, Irene Rose Paul |
---|---|
Rok vydání: | 2020 |
Předmět: |
Boosting (machine learning)
Ensemble forecasting Computer science business.industry Stacking 020206 networking & telecommunications Feature selection 02 engineering and technology Machine learning computer.software_genre Logistic regression medicine.disease Support vector machine Naive Bayes classifier ComputingMethodologies_PATTERNRECOGNITION Breast cancer F-test 0202 electrical engineering electronic engineering information engineering medicine General Earth and Planetary Sciences 020201 artificial intelligence & image processing Artificial intelligence business computer General Environmental Science |
Zdroj: | Procedia Computer Science. 171:1561-1570 |
ISSN: | 1877-0509 |
DOI: | 10.1016/j.procs.2020.04.167 |
Popis: | Cancer data sets contains many details of patient information, out of which only a few attributes contribute in predicting the accurate stage of cancer. Certain attributes of the entire data set play a major role in deciding the type of cancer i.e. whether benign or malignant hence feature selection techniques are useful in such scenarios for retaining the relevant feature set. Moreover, in order to achieve our goal of predicting the accurate stage of cancer, we need an appropriate model which generally results in higher accuracy and ensemble model proves to be the best model for such scenarios. In this study, we are using the existing ensemble techniques along with a combination of supervised machine learning algorithms to develop a new model for breast cancer prediction. We are also using feature selection techniques to enhance the performance of the ensemble model. For this purpose, machine learning algorithms like Support Vector Machines, Naive Bayes, K-Nearest Neighbors, Logistics Regression and feature selection techniques like Variance threshold and f-test have been taken into consideration. To achieve higher accuracy for the ensemble model, bagging, boosting and stacking techniques are used. |
Databáze: | OpenAIRE |
Externí odkaz: |