Improving Accuracy of Imbalanced Clinical Data Classification Using Synthetic Minority Over-Sampling Technique

Autor: Fatihah Mohd, Noor Maizura Mohamad Noora, Wan Fatin Fatihah Yahya, Suryani Ismail, Masita Abdul Jalil, Mumtazimah Mohamad
Rok vydání: 2019
Předmět:
Zdroj: Communications in Computer and Information Science ISBN: 9783030363642
DOI: 10.1007/978-3-030-36365-9_8
Popis: Imbalanced datasets typically occur in many real applications. Resampling is one of the effective solutions due to producing a balanced class distribution. Synthetic Minority Over-sampling technique (SMOTE), an over-sampling technique is used in this study for dealing the imbalanced dataset by add the number of instances of a minority class. This technique is used to decrease the imbalance percentage of the dataset by generating new synthetic samples. Thus, a balanced training dataset is produced to replace the class imbalanced. The balanced datasets were obtained and trained with machine learning algorithms to diagnose the disease’s class. Through the experiment findings on the real-world datasets, oral cancer dataset and erythemato-squamous diseases dataset from the UCI machine learning datasets, an over-sampling method showed better results in clinical disease classification.
Databáze: OpenAIRE