Bayes classifiers for imbalanced traffic accidents datasets
Autor: | Laura Garach, Griselda López, Randa Oqab Mujalli |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2016 |
Předmět: |
Engineering
Poison control Datasets as Topic Human Factors and Ergonomics 02 engineering and technology computer.software_genre INGENIERIA E INFRAESTRUCTURA DE LOS TRANSPORTES Accident (fallacy) Bayes' theorem 0502 economics and business Statistics Injury prevention 0202 electrical engineering electronic engineering information engineering Humans Cities Safety Risk Reliability and Quality Weather SMOTE 050210 logistics & transportation Jordan Trauma Severity Indices business.industry Traffic accidents Speed limit 05 social sciences Public Health Environmental and Occupational Health Accidents Traffic Bayesian network Bayes Theorem Urban area Imbalanced data set Causality Statistical classification Bayesian networks Wounds and Injuries 020201 artificial intelligence & image processing Environment Design Data mining business computer Algorithms |
Zdroj: | RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia instname |
DOI: | 10.1016/j.aap.2015.12.003 |
Popis: | [EN] Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. (C) 2015 Elsevier Ltd. All rights reserved. The authors are grateful to the Police Traffic Department in Jordan for providing the data necessary for this research. Griselda Lopez wishes to express her acknowledgement to the regional ministry of Economy, Innovation and Science of the regional government of Andalusia (Spain) for their scholarship to train teachers and researchers in Deficit Areas, which has made this work possible. The authors appreciate the reviewers' comments and effort in order to improve the paper. |
Databáze: | OpenAIRE |
Externí odkaz: |