Impact of Data Quality and Target Representation on Predictions for Urban Bus Networks
Autor: | Marcin Budka, Thilo Reich, David Hulbert |
---|---|
Rok vydání: | 2020 |
Předmět: |
050210 logistics & transportation
Computer science business.industry Reliability (computer networking) Deep learning media_common.quotation_subject 05 social sciences 02 engineering and technology Machine learning computer.software_genre Public transport Data integrity Encoding (memory) Data quality 0502 economics and business 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Quality (business) Artificial intelligence business Representation (mathematics) computer media_common |
Zdroj: | SSCI |
DOI: | 10.1109/ssci47803.2020.9308166 |
Popis: | Passengers of urban bus networks often rely on forecasts of Estimated Times of Arrival (ETA) and live-vehicle movements to plan their journeys. ETA predictions are unreliable due to the lack of good quality historical data, while ‘live’ positions in mobile apps suffer from delays in data transmission. This study uses deep neural networks to predict the next position of a bus under various vehicle-location data-quality regimes. Additionally, we assess the effect of the target representation\ud in the prediction problem by encoding it either as unconstrained\ud geographical coordinates, progress along known trajectory or ETA at the next two stops. We demonstrate that without data cleaning, model predictions give false confidence if mean errors are used, highlighting the importance of a holistic assessment of the results. We show that target representation affects the prediction accuracy, by constraining the prediction space. The literature is vague about quality issues in public transport data. Here we show that noisy data is a problem and discuss\ud simple but effective approaches to address these issues. Research\ud generally only focuses on a single method of target representation.\ud Therefore, comparing several methods is a useful addition to the\ud literature. This gives insight into the value of addressing data quality issues in urban transport data to enable better predictions and improve the passenger experience. We show that ‘rephrasing’ the prediction problem by changing the target representation can yield massively improved predictions. Our findings enable researchers using deep learning approaches in public transport to make more informed decisions about essential data cleaning steps and problem representation for improved results. |
Databáze: | OpenAIRE |
Externí odkaz: |