SubspaceDB : In-database subspace clustering for analytical query processing

Autor:	M. R. Kaimal, Sandhya Harikumar
Rok vydání:	2019
Předmět:	Clustering high-dimensional data SQL Information Systems and Management Database Computer science InformationSystems_DATABASEMANAGEMENT computer.software_genre User-defined function Medoid Set (abstract data type) Relational database management system Tuple computer Subspace topology computer.programming_language
Zdroj:	Data & Knowledge Engineering. 121:109-129
ISSN:	0169-023X
DOI:	10.1016/j.datak.2019.05.003
Popis:	High dimensional data analysis within relational database management systems (RDBMS) is challenging because of inadequate support from SQL. Currently, subspace clustering of high dimensional data is implemented either outside DBMS using wrapper code or inside DBMS using SQL User Defined Functions/Aggregates(UDFs/UDAs). However, both these approaches have potential disadvantages from performance, resource usage, and security perspective for voluminous and frequently updated data. Hence, we propose an efficient querying system, named SubspaceDB, that implements subspace clustering directly within an RDBMS. SubspaceDB provides a novel set of query operators, each with an optimization objective, to facilitate interactive analysis for subspace clustering. The query operators focus on retrieving optimal answers to four key query types : (a) Medoid queries, (b) Neighbourhood queries, (c) Partial similarity queries, and (d) Prominence queries, that aid the formation of subspace clusters. Experimental studies on real and synthetic databases of size 15 M tuples and 104 attributes show that our proposed approach SubspaceDB can be over 10 times faster as compared to a conventional wrapper-based or SQL UDF approach. The proposed approach is also efficient in retrieving at least 50% data with performance improvement of at least 25%.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::221a9681ceb51b9e50f8c24708a2b2a3 https://doi.org/10.1016/j.datak.2019.05.003 Zobrazit plný text záznamu Full Text from ScienceDirect