Cesarean Section Classification Using Machine Learning With Feature Selection, Data Balancing, and Explainability

Autor: Nahid Sultan, Mahmudul Hasan, Md. Ferdous Wahid, Hasi Saha, Ahsan Habib
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: IEEE Access, Vol 11, Pp 84487-84499 (2023)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2023.3303342
Popis: Disease samples are naturally fewer than healthy samples which introduces bias in the training of machine learning (ML) models. Current study focuses in learning discriminating patterns between cesarean and non-cesarean phenomena based on a dataset consisting of 161 features of total 692 cesarean and 5465 non-cesarean samples which comes as four folds based on four different hospitals (hospital A, B, C and D). The dataset is noisy, contains missing values, features are at different scales and above all, 161 features are quite a large in number and risks containing unnecessary information with respect to learning to separate the C-section class from non-cesarean.This study introduced a data pre-processing pipeline, resolving issues with data imbalance, handling missing values, identifying and deleting outliers, etc. A novel ensemble model is proposed which is able to consistently perform better irrespective of data volumes (data fold A, A+B, A+B+C and A+B+C+D) and pre-processing pipeline and achieved 96-99% accuracy across data volumes. Finally, the proposed model’s decision-making was explained in terms of prominent features where higher values of features like Episiotomy, age of women and Fetal intrapartum pH accounts for causing C-section.
Databáze: Directory of Open Access Journals