Application of Machine Learning Techniques to Predict the Occurrence of Distraction-affected Crashes with Phone-Use Data
Autor: | Yongxin Peng, Lingtao Wu, Xiaoyu Guo, Xiaoqiang Kong, Xiubin Wang, Chaolun Ma |
---|---|
Rok vydání: | 2021 |
Předmět: | |
Zdroj: | Transportation Research Record: Journal of the Transportation Research Board. 2676:692-705 |
ISSN: | 2169-4052 0361-1981 |
Popis: | Distraction occurs when a driver’s attention is diverted from driving to a secondary task. The number of distraction-affected crashes has been increasing in recent years. Accurately predicting distraction-affected crashes is critical for roadway agencies to reduce distracted driving behaviors and distraction-affected crashes. Recently, more and more emerging phone-use data and machine learning techniques are available to safety researchers, and can potentially improve the prediction of distraction-affected crashes. Therefore, this study first examines if phone-use events provide essential information for distraction-affected crashes. The authors apply the machine learning technique (i.e., XGBoost) under two scenarios, with and without phone-use events, and compare their performances with two conventional statistical models: logistic regression model and mixed-effects logistic regression model. The comparison demonstrates the superiority of XGBoost over logistic regression with a high-dimensional unbalanced dataset. Further, this study implements SHAP (SHapley Additive exPlanation) to interpret the results and analyze the importance of individual features related to distraction-affected crashes and tests its ability to improve prediction accuracy. The trained XGBoost model achieves a sensitivity of 91.59%, a specificity of 85.92%, and 88.72% accuracy. The XGBoost and SHAP results suggest that: (1) phone-use information is an important factor associated with the occurrences of distraction-affected crashes; (2) distraction-affected crashes are more likely to occur on roadway segments with higher exposure (i.e., length and traffic volume), unevenness of traffic flow condition, or with medium truck volume. |
Databáze: | OpenAIRE |
Externí odkaz: |