SubspaceDB : In-database subspace clustering for analytical query processing
Autor: | M. R. Kaimal, Sandhya Harikumar |
---|---|
Rok vydání: | 2019 |
Předmět: |
Clustering high-dimensional data
SQL Information Systems and Management Database Computer science InformationSystems_DATABASEMANAGEMENT computer.software_genre User-defined function Medoid Set (abstract data type) Relational database management system Tuple computer Subspace topology computer.programming_language |
Zdroj: | Data & Knowledge Engineering. 121:109-129 |
ISSN: | 0169-023X |
DOI: | 10.1016/j.datak.2019.05.003 |
Popis: | High dimensional data analysis within relational database management systems (RDBMS) is challenging because of inadequate support from SQL. Currently, subspace clustering of high dimensional data is implemented either outside DBMS using wrapper code or inside DBMS using SQL User Defined Functions/Aggregates(UDFs/UDAs). However, both these approaches have potential disadvantages from performance, resource usage, and security perspective for voluminous and frequently updated data. Hence, we propose an efficient querying system, named SubspaceDB, that implements subspace clustering directly within an RDBMS. SubspaceDB provides a novel set of query operators, each with an optimization objective, to facilitate interactive analysis for subspace clustering. The query operators focus on retrieving optimal answers to four key query types : (a) Medoid queries, (b) Neighbourhood queries, (c) Partial similarity queries, and (d) Prominence queries, that aid the formation of subspace clusters. Experimental studies on real and synthetic databases of size 15 M tuples and 104 attributes show that our proposed approach SubspaceDB can be over 10 times faster as compared to a conventional wrapper-based or SQL UDF approach. The proposed approach is also efficient in retrieving at least 50% data with performance improvement of at least 25%. |
Databáze: | OpenAIRE |
Externí odkaz: |