Cost-effective conceptual design using taxonomies
Autor: | Ali Vakilian, Yodsawalai Chodpathumwan, Amir Nayyeri, Arash Termehchy |
---|---|
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Information retrieval Computer science Conceptual model (computer science) Databases (cs.DB) 02 engineering and technology Empirical research Computer Science - Databases Conceptual design Hardware and Architecture 020204 information systems Taxonomy (general) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Information Systems |
Zdroj: | The VLDB Journal. 27:369-394 |
ISSN: | 0949-877X 1066-8888 |
Popis: | It is known that annotating named entities in unstructured and semi-structured data sets by their concepts improves the effectiveness of answering queries over these data sets. As every enterprise has a limited budget of time or computational resources, it has to annotate a subset of concepts in a given domain whose costs of annotation do not exceed the budget. We call such a subset of concepts a {\it conceptual design} for the annotated data set. We focus on finding a conceptual design that provides the most effective answers to queries over the annotated data set, i.e., a {\it cost-effective conceptual design}. Since, it is often less time-consuming and costly to annotate general concepts than specific concepts, we use information on superclass/subclass relationships between concepts in taxonomies to find a cost-effective conceptual design. We quantify the amount by which a conceptual design with concepts from a taxonomy improves the effectiveness of answering queries over an annotated data set. If the taxonomy is a tree, we prove that the problem is NP-hard and propose an efficient approximation and pseudo-polynomial time algorithms for the problem. We further prove that if the taxonomy is a directed acyclic graph, given some generally accepted hypothesis, it is not possible to find any approximation algorithm with reasonably small approximation ratio for the problem. Our empirical study using real-world data sets, taxonomies, and query workloads shows that our framework effectively quantifies the amount by which a conceptual design improves the effectiveness of answering queries. It also indicates that our algorithms are efficient for a design-time task with pseudo-polynomial algorithm being generally more effective than the approximation algorithm. |
Databáze: | OpenAIRE |
Externí odkaz: |