Examining imbalanced classification algorithms in predicting real-time traffic crash risk.

Autor: Peng Y; Tongji University, School of Transportation Engineering, China. Electronic address: yichuanpeng1982@hotmail.com., Li C; Tongji University, School of Transportation Engineering, China. Electronic address: 1731304@tongji.edu.cn., Wang K; Tongji University, School of Transportation Engineering, China. Electronic address: kew@tongji.edu.cn., Gao Z; Tongji University, School of Software Engineering, China. Electronic address: gaozhen@tongji.edu.cn., Yu R; Tongji University, School of Transportation Engineering, China. Electronic address: yurongjie@tongji.edu.cn.
Jazyk: angličtina
Zdroj: Accident; analysis and prevention [Accid Anal Prev] 2020 Sep; Vol. 144, pp. 105610. Date of Electronic Publication: 2020 Jun 16.
DOI: 10.1016/j.aap.2020.105610
Abstrakt: The Active Traffic Management (ATM) system has been widely used in the United States and the European countries to improve the traffic safety of urban expressways. The accurate real-time crash risk prediction is fundamental to the system running well. Crash data are characterized by small probability, which poses a typical Imbalanced Data Classification problem. Most previous studies mainly improved the prediction methods only in data level or algorithm level, which may be inadequate to predict the crash risk accurately especially in a continuous real-time traffic data environment. The comprehensive imbalanced classification algorithm was examined in this research to build more accurate real-time traffic crash risk prediction model. At the output level, the Youden index method has been proved to be of the best ability to divide the prediction results and Probability Calibration Method was proposed to optimize the prediction results in further. At the data level, Under-sampling and Synthetic Minority Oversampling Technique(SMOTE) methods were compared to solve the imbalanced data classification problem by changing the data distribution. At the algorithm level, the cost-sensitive MLP algorithm and Adaboost algorithm were examined and finally the random sampling cost-sensitive MLP model(RCSMLP) and Rusboost model were constructed by synthesizing the optimization methods from three levels. The sensitivity of the RCSMLP model reached 78.10 % and the specificity of the model reached 81.44 %. The AUC and sensitivity of the Rusboost model reached 0.892 and 0.842 while the specificity of the model reached 0.816, which shows the better performance in dealing with the imbalanced traffic crash risk prediction problem compared to existed prediction models. The proposed method of improving prediction accuracy in this study is universal and can be applied to many other prediction models to predict real-time traffic crash risk.
Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2020 Elsevier Ltd. All rights reserved.)
Databáze: MEDLINE