Cesarean Section Classification Using Machine Learning With Feature Selection, Data Balancing, and Explainability

Autor:	Nahid Sultan, Mahmudul Hasan, Md. Ferdous Wahid, Hasi Saha, Ahsan Habib
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Cesarean section feature selection data balancing machine learning explainable AI Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 11, Pp 84487-84499 (2023)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2023.3303342
Popis:	Disease samples are naturally fewer than healthy samples which introduces bias in the training of machine learning (ML) models. Current study focuses in learning discriminating patterns between cesarean and non-cesarean phenomena based on a dataset consisting of 161 features of total 692 cesarean and 5465 non-cesarean samples which comes as four folds based on four different hospitals (hospital A, B, C and D). The dataset is noisy, contains missing values, features are at different scales and above all, 161 features are quite a large in number and risks containing unnecessary information with respect to learning to separate the C-section class from non-cesarean.This study introduced a data pre-processing pipeline, resolving issues with data imbalance, handling missing values, identifying and deleting outliers, etc. A novel ensemble model is proposed which is able to consistently perform better irrespective of data volumes (data fold A, A+B, A+B+C and A+B+C+D) and pre-processing pipeline and achieved 96-99% accuracy across data volumes. Finally, the proposed model’s decision-making was explained in terms of prominent features where higher values of features like Episiotomy, age of women and Fetal intrapartum pH accounts for causing C-section.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/02da2e3751864bab8661871fd51bbb6e Zobrazit plný text záznamu View record in DOAJ