Autor: |
Ogungbire, Abimbola, Pulugurtha, Srinivas S. |
Zdroj: |
Transportation Research Record; November 2024, Vol. 2678 Issue: 11 p88-105, 18p |
Abstrakt: |
Accurate predictive modeling is often hindered by the prevalent issue of class imbalance within weather-related crash datasets. To address this critical challenge, this study introduces a novel and tailored synthetic data generation technique aimed at effectively handling nominal predictors specific to weather-related crash cases in North Carolina. Data treatment techniques such as the synthetic minority over-sampling technique-nominal (SMOTE-N) and adaptive synthetic-nominal (ADASYN-N) were investigated in this study. A comprehensive comparison of these data treatment techniques is conducted using two prominent machine learning models: the bagging algorithm (random forest [RF]) and the boosting algorithm (extreme gradient boosting [XGBoost]). The findings indicate that the effectiveness of data treatment varies with the severity level and the algorithm used. The ADASYN-N technique was observed to be highly effective for severe and moderate injury crash prediction using both the RF and XGBoost models, while the control dataset and SMOTE-N demonstrated notable performance in property damage only crash prediction using both the RF and XGBoost models. The findings from evaluating the performance of these models with data treatment methods serve as a benchmark for practitioners in selecting appropriate synthetic sample generation techniques, consequently facilitating the development of more accurate crash severity prediction models and contributing to enhanced traffic safety strategies. |
Databáze: |
Supplemental Index |
Externí odkaz: |
|