Distributed Mining of Spatial High Utility Itemsets in Very Large Spatiotemporal Databases using Spark In-Memory Computing Architecture

Autor:	Truong Cong Thang, R. Uday Kiran, Yukata Watanobe, Incheon Paik, Cheng-Wei Wu, Koji Zettsu, Minh-Son Dao, Sadanori Ito
Rok vydání:	2020
Předmět:	Spatiotemporal database business.industry Computer science Big data Fault tolerance 02 engineering and technology computer.software_genre In-Memory Processing Distributed algorithm 020204 information systems Spark (mathematics) Scalability 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Pruning (decision trees) Data mining business computer
Zdroj:	IEEE BigData
Popis:	Finding Spatial High Utility Itemsets (SHUIs) in a spatiotemporal database is a challenging problem of great importance in many real-world applications. Most previous works focused on the sequential discovery of SHUIs in a database running on a single machine. Consequently, these works are not suitable for big data (or cloud-based) applications as they suffer from the scalability and fault tolerant problems. This paper proposes several novel pruning techniques to reduce the search space and present a more flexible distributed algorithm to find all desired itemsets from the database using Spark in-memory computing architecture. Our algorithm inherits several advantages of Spark, including low communication cost, fault tolerance, and high scalability. Experimental results demonstrate that the proposed algorithm has good scalability and performance on very large databases. Finally, we present a real-world navigation application in which SHUIs generated from the traffic congestion data have been employed to recommend alternative routes to the users.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::2c2879ce67d8e4c2179eaba6d0494c30 https://doi.org/10.1109/bigdata50022.2020.9377946 Zobrazit plný text záznamu