Abstrakt: |
Recognition of spatio-temporal traffic patterns at the network-wide level plays an important role in data-driven intelligent transport systems (ITS) and is a basis for applications such as short-term prediction and scenario-based traffic management. Common practice in the transport literature is to rely on well-known general unsupervised machine-learning methods (e.g., k-means, hierarchical, spectral, DBSCAN) to select the most representative structure and number of day-types based solely on internal evaluation indices. These are easy to calculate but are limited since they only use information in the clustered dataset itself. In addition, the quality of clustering should ideally be demonstrated by external validation criteria, by expert assessment or the performance in its intended application. The main contribution of this paper is to test and compare the common practice of internal validation with external validation criteria represented by the application to short-term prediction, which also serves as a proxy for more general traffic management applications. When compared to external evaluation using short-term prediction, internal evaluation methods have a tendency to underestimate the number of representative day-types needed for the application. Additionally, the paper investigates the impact of using dimensionality reduction. By using just 0.1% of the original dataset dimensions, very similar clustering and prediction performance can be achieved, with up to 20 times lower computational costs, depending on the clustering method. K-means and agglomerative clustering may be the most scalable methods, using up to 60 times fewer computational resources for very similar prediction performance to the p-median clustering. [ABSTRACT FROM AUTHOR] |