Content-Based Geospatial Schema Matching Using Semi-supervised Geosemantic Clustering and Hierarchy

Autor: Latifur Khan, Jeffrey Partyka
Rok vydání: 2011
Předmět:
Zdroj: ICSC
Popis: The problem of semantic similarity across heterogeneous geospatial data sources continues to attract interest. Semantic similarity across data sources typically involves 1:1 matching of attributes and their instances between tables. Using clustering methods, three distinct challenges remain unaddressed. First, many clustering algorithms rely only on one instance property. Second, a consistent score for an attribute match is not produced. Finally, hierarchical relationships between the data are not considered. To address these, we introduce GeoSim, a tool for determining the semantic similarity between geospatial schemas. GeoSim consists of GeoSimG and GeoSimH. GeoSimG derives clusters from attribute instances based on their geographic and semantic properties. It examines attribute instances in the clusters to calculate a consistent semantic similarity score through entropy-based distribution (EBD). GeoSimH also captures hierarchical relationships between compared tables and attributes. Results from experiments involving multi-jurisdictional geospatial datasets show that GeoSim outperforms several popular semantic similarity approaches.
Databáze: OpenAIRE