A study on using deep autoencoders for imbalanced binary classification
Autor: | Vlad-Ioan Tomescu, Ştefan Niţică, Gabriela Czibula |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer science
business.industry Deep learning Supervised learning medicine.disease Machine learning computer.software_genre Field (computer science) Domain (software engineering) Breast cancer Binary classification Application domain medicine General Earth and Planetary Sciences Artificial intelligence Set (psychology) business computer General Environmental Science |
Zdroj: | KES |
ISSN: | 1877-0509 |
DOI: | 10.1016/j.procs.2021.08.013 |
Popis: | Imbalanced classification represents a challenge for supervised learning, as an unequal distribution of classes in the training data set is mainly connected to poor predictive performance for the minority class. However, usually the minority class is the most relevant one, from a practical perspective. But due to the imbalancement of the training data, the classification errors for the minority class are higher, as the classifiers are usually biased to predict the majority class. In this paper we investigate the use of autoencoders for improving the predictive performance for imbalanced binary classification problems. As an application domain we consider breast cancer detection, that is an imbalanced classification problem of great interest in the medical domain. According to the World Health Organisation, breast cancer represents the primary cause of cancer mortality in women. Nowadays there is an increasing interest in applying conventional machine learning and more recently deep learning techniques in the breast cancer detection field by helping medical experts in the early detection of the disease. One of the paper’s goal is to investigate the ability of deep autoencoders to learn patterns within the classes of benign and malignant instances. Secondly, we propose and compare two autoencoders-based classification models for breast cancer detection. The performances of the proposed models were empirically assessed on data sets previously used in the breast cancer detection literature. The results show that our best model compares favourably with the results of most of the classifiers used for comparison and that it is able to handle well the data imbalancement. |
Databáze: | OpenAIRE |
Externí odkaz: |