A Heuristic Approach for On-line Discovery of Unidentified Spatial Clusters from Grid-Based Streaming Algorithms
Autor: | Francisco José da Silva e Silva, Marcos Roriz Junior, Hélio Lopes, Markus Endler, Marco A. Casanova |
---|---|
Rok vydání: | 2016 |
Předmět: |
Weight function
Similarity (geometry) business.industry Computer science Heuristic (computer science) Heuristic Process (computing) 020207 software engineering 02 engineering and technology Machine learning computer.software_genre Grid Position (vector) 020204 information systems Line (geometry) 0202 electrical engineering electronic engineering information engineering Artificial intelligence business Streaming algorithm computer Algorithm |
Zdroj: | Big Data Analytics and Knowledge Discovery ISBN: 9783319439457 DaWaK |
DOI: | 10.1007/978-3-319-43946-4_9 |
Popis: | On-line spatial clustering of large position streams are useful for several applications, such as monitoring urban traffic and massive events. To rapidly and timely detect in real-time these spatial clusters, algorithms explored grid-based approaches, which segments the spatial domain into discrete cells. The primary benefit of this approach is that it switches the costly distance comparison of density-based algorithms to counting the number of moving objects mapped to each cell. However, during this process, the algorithm may fail to identify clusters of spatially and temporally close moving objects that get mapped to adjacent cells. To overcome this answer loss problem, we propose a density heuristic that is sensible to moving objects in adjacent cells. The heuristic further subdivides each cell into inner slots. Then, we calculate the density of a cell by adding the object count of the cell itself with the object count of the inner slots of its adjacent cells, using a weight function. To avoid collateral effects and detecting incorrect clusters, we apply the heuristic only to transient cells, that is, cells whose density are less than, but close to the density threshold value. We evaluate our approach using real-world datasets and explore how different transient thresholds and the number of inner slots influence the similarity and the number of detected, correct and incorrect, and undetected clusters when compared to the baseline result. |
Databáze: | OpenAIRE |
Externí odkaz: |