Efficient Processing of Analytical Queries Extended with Similarity Search Predicates over Images in Spark

Autor:	Guilherme Muzzi da Rocha, Cristina Dutra de Aguiar Ciferri
Rok vydání:	2020
Zdroj:	Journal of Information and Data Management. 11
ISSN:	2178-7107
DOI:	10.5753/jidm.2020.2019
Popis:	An image data warehousing extends a conventional data warehousing to also manipulate images represented by feature vectors and attributes for similarity search. A challenge that arises is the efficient processing of analytical queries extended with a similarity search predicate. These queries have a high computational cost since they require the processing of costly star join operations and distance calculations in the same setting. We consider applications that manage huge volumes of data, where the use of parallel and distributed data processing frameworks is needed. In this article, we introduce two methods to efficiently solve this challenge in Spark. BrOmnImg is based on the integration of the broadcast join and the Omni techniques for the processing of the star join operation and the distance calculations, respectively. BrOmnImgCF extends BrOmnImg by using the conventional predicate to further reduce the number of distance calculations. Compared with the closest method available in the literature, BrOmnImg reduced the time spent on query processing by up to about 65%. Compared with BrOmnImg, BrOmnImgCF improved the performance by up to about 54%.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::1f167b233f1df231bb449abc34d12724 https://doi.org/10.5753/jidm.2020.2019 Zobrazit plný text záznamu