STHist-C: a highly accurate cluster-based histogram for two and three dimensional geographic data points

Autor: Jaeho Kim, Yohan Roh, Myoung Ho Kim, Hai Thanh Mai
Rok vydání: 2012
Předmět:
Zdroj: GeoInformatica. 17:325-352
ISSN: 1573-7624
1384-6175
Popis: Histograms have been widely used for estimating selectivity in query optimization. In this paper, we propose a new histogram construction method for geographic data objects that are used in many real-world applications. The proposed method is based on analyses and utilization of clusters of objects that exist in a given data set, to build histograms with significantly enhanced accuracy. Our philosophy in allocating the histogram buckets is to allocate them to the subspaces that properly capture object clusters. Therefore, we first propose a procedure to find the centers of object clusters. Then, we propose an algorithm to construct the histogram buckets from these centers. The buckets are initialized from the clusters' centers, then expanded to cover the clusters. Best expansion plans are chosen based on a notion of skewness gain. Results from extensive experiments using real-life data sets demonstrate that the proposed method can really improve the accuracy of the histograms further, when compared with the current state of the art histogram construction method for geographic data objects.
Databáze: OpenAIRE