Impact of Data Quality and Target Representation on Predictions for Urban Bus Networks

Autor: Marcin Budka, Thilo Reich, David Hulbert
Rok vydání: 2020
Předmět:
Zdroj: SSCI
DOI: 10.1109/ssci47803.2020.9308166
Popis: Passengers of urban bus networks often rely on forecasts of Estimated Times of Arrival (ETA) and live-vehicle movements to plan their journeys. ETA predictions are unreliable due to the lack of good quality historical data, while ‘live’ positions in mobile apps suffer from delays in data transmission. This study uses deep neural networks to predict the next position of a bus under various vehicle-location data-quality regimes. Additionally, we assess the effect of the target representation\ud in the prediction problem by encoding it either as unconstrained\ud geographical coordinates, progress along known trajectory or ETA at the next two stops. We demonstrate that without data cleaning, model predictions give false confidence if mean errors are used, highlighting the importance of a holistic assessment of the results. We show that target representation affects the prediction accuracy, by constraining the prediction space. The literature is vague about quality issues in public transport data. Here we show that noisy data is a problem and discuss\ud simple but effective approaches to address these issues. Research\ud generally only focuses on a single method of target representation.\ud Therefore, comparing several methods is a useful addition to the\ud literature. This gives insight into the value of addressing data quality issues in urban transport data to enable better predictions and improve the passenger experience. We show that ‘rephrasing’ the prediction problem by changing the target representation can yield massively improved predictions. Our findings enable researchers using deep learning approaches in public transport to make more informed decisions about essential data cleaning steps and problem representation for improved results.
Databáze: OpenAIRE