Distributed Mining of Spatial High Utility Itemsets in Very Large Spatiotemporal Databases using Spark In-Memory Computing Architecture
Autor: | Truong Cong Thang, R. Uday Kiran, Yukata Watanobe, Incheon Paik, Cheng-Wei Wu, Koji Zettsu, Minh-Son Dao, Sadanori Ito |
---|---|
Rok vydání: | 2020 |
Předmět: |
Spatiotemporal database
business.industry Computer science Big data Fault tolerance 02 engineering and technology computer.software_genre In-Memory Processing Distributed algorithm 020204 information systems Spark (mathematics) Scalability 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Pruning (decision trees) Data mining business computer |
Zdroj: | IEEE BigData |
Popis: | Finding Spatial High Utility Itemsets (SHUIs) in a spatiotemporal database is a challenging problem of great importance in many real-world applications. Most previous works focused on the sequential discovery of SHUIs in a database running on a single machine. Consequently, these works are not suitable for big data (or cloud-based) applications as they suffer from the scalability and fault tolerant problems. This paper proposes several novel pruning techniques to reduce the search space and present a more flexible distributed algorithm to find all desired itemsets from the database using Spark in-memory computing architecture. Our algorithm inherits several advantages of Spark, including low communication cost, fault tolerance, and high scalability. Experimental results demonstrate that the proposed algorithm has good scalability and performance on very large databases. Finally, we present a real-world navigation application in which SHUIs generated from the traffic congestion data have been employed to recommend alternative routes to the users. |
Databáze: | OpenAIRE |
Externí odkaz: |