Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery
Autor: | R.S. Evans, Michael W. Varner, Nitesh V. Chawla, L.M. Taft, Joyce A. Mitchell, Chi-Ren Shyu, Marlene J. Egger, Sidney N. Thornton, Bruce E. Bray |
---|---|
Rok vydání: | 2009 |
Předmět: |
Drug-Related Side Effects and Adverse Reactions
Decision tree Health Informatics Adverse drug events Machine learning computer.software_genre Models Biological Statistics Nonparametric Article Pattern Recognition Automated Naive Bayes classifier Bayes' theorem Pregnancy Intensive care Humans Medicine Labor and delivery Data-mining Analysis of Variance Labor Obstetric business.industry Oversampling Decision Trees Reproducibility of Results Bayes Theorem Risk factor (computing) Decision Support Systems Clinical Delivery Obstetric Computer Science Applications Statistical classification Identification (information) Databases as Topic ROC Curve Female Artificial intelligence Data mining business computer Algorithms Predictive modelling |
Zdroj: | Journal of Biomedical Informatics. 42:356-364 |
ISSN: | 1532-0464 |
DOI: | 10.1016/j.jbi.2008.09.001 |
Popis: | BackgroundThe IOM report, Preventing Medication Errors, emphasizes the overall lack of knowledge of the incidence of adverse drug events (ADE). Operating rooms, emergency departments and intensive care units are known to have a higher incidence of ADE. Labor and delivery (L&D) is an emergency care unit that could have an increased risk of ADE, where reported rates remain low and under-reporting is suspected. Risk factor identification with electronic pattern recognition techniques could improve ADE detection rates.ObjectiveThe objective of the present study is to apply Synthetic Minority Over Sampling Technique (SMOTE) as an enhanced sampling method in a sparse dataset to generate prediction models to identify ADE in women admitted for labor and delivery based on patient risk factors and comorbidities.ResultsBy creating synthetic cases with the SMOTE algorithm and using a 10-fold cross-validation technique, we demonstrated improved performance of the Naïve Bayes and the decision tree algorithms. The true positive rate (TPR) of 0.32 in the raw dataset increased to 0.67 in the 800% over-sampled dataset.ConclusionEnhanced performance from classification algorithms can be attained with the use of synthetic minority class oversampling techniques in sparse clinical datasets. Predictive models created in this manner can be used to develop evidence based ADE monitoring systems. |
Databáze: | OpenAIRE |
Externí odkaz: |