Overfitting Prevention in Accident Prediction Models: Bayesian Regularization of Artificial Neural Networks

Autor: Nicholas Fiorentini, Diletta Pellegrini, Massimo Losa
Rok vydání: 2022
Předmět:
Zdroj: Transportation Research Record: Journal of the Transportation Research Board. 2677:1455-1470
ISSN: 2169-4052
0361-1981
DOI: 10.1177/03611981221111367
Popis: In the present paper, we implemented the Bayesian regularization (BR) backpropagation algorithm for calibrating an artificial neural network (ANN) as an accident prediction model (APM) to be used on Italian four-lane divided roads. We chose the BR-ANN since it efficiently allows for dealing with small sample size and avoiding overfitting issues by adding a regularization term in the objective function to be minimized during training. Moreover, BR-ANNs are sparsely employed in road safety analyses, and their peculiarities deserve to be emphasized. In our work, the BR-ANN aims to predict the number of fatal and injury (FI) crashes across 236 road elements, for a total length of 78 km. The input features are road element length, horizontal and vertical alignment, cross-section geometry, operating speed, traffic flow, sight distance, and road area type (i.e., a categorical predictor accounting for the potential influence of merge and diverge influence areas). Training and test phases of the BR-ANN have been evaluated by determination coefficient ( R2), root mean square error (RMSE), overfitting ratio (OR), scatterplots, residuals analysis, and by the same ANN architecture trained with the gradient descent (GD) with momentum and adaptive learning rate backpropagation algorithm (GD-ANN). Results demonstrate that the BR-ANN markedly outperforms the GD-ANN, which suffers severe overfitting issues. Furthermore, BR-ANN does not overfit data (OR close to the unity), reports a satisfactory R2 (0.726), and shows a Gaussian residual distribution with zero mean. Therefore, road authorities could consider regularized ANNs for performing appropriate safety analyses, especially when dealing with small road sample sizes.
Databáze: OpenAIRE