Bayes classifiers for imbalanced traffic accidents datasets

Autor: Laura Garach, Griselda López, Randa Oqab Mujalli
Jazyk: angličtina
Rok vydání: 2016
Předmět:
Engineering
Poison control
Datasets as Topic
Human Factors and Ergonomics
02 engineering and technology
computer.software_genre
INGENIERIA E INFRAESTRUCTURA DE LOS TRANSPORTES
Accident (fallacy)
Bayes' theorem
0502 economics and business
Statistics
Injury prevention
0202 electrical engineering
electronic engineering
information engineering

Humans
Cities
Safety
Risk
Reliability and Quality

Weather
SMOTE
050210 logistics & transportation
Jordan
Trauma Severity Indices
business.industry
Traffic accidents
Speed limit
05 social sciences
Public Health
Environmental and Occupational Health

Accidents
Traffic

Bayesian network
Bayes Theorem
Urban area
Imbalanced data set
Causality
Statistical classification
Bayesian networks
Wounds and Injuries
020201 artificial intelligence & image processing
Environment Design
Data mining
business
computer
Algorithms
Zdroj: RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
instname
DOI: 10.1016/j.aap.2015.12.003
Popis: [EN] Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. (C) 2015 Elsevier Ltd. All rights reserved.
The authors are grateful to the Police Traffic Department in Jordan for providing the data necessary for this research. Griselda Lopez wishes to express her acknowledgement to the regional ministry of Economy, Innovation and Science of the regional government of Andalusia (Spain) for their scholarship to train teachers and researchers in Deficit Areas, which has made this work possible. The authors appreciate the reviewers' comments and effort in order to improve the paper.
Databáze: OpenAIRE