Popis: |
The Semantic Web emerged in response to the unprecedented growth of information and data sharing on the Web. It consists of a set of technologies that enable the automatic (machine) management and processing of linked data across hundreds of distributed repositories. To connect and interlink data, the Semantic Web uses Resource Description Framework (RDF), which is a graph-based data model that simplifies the description of resources using triples (subject, predicate, object). The representation of data in RDF usually follows an ontology, a knowledge base model that dictates the relationships and characteristics of the linked data. Ontologies play an important role in the Semantic Web and are a key component. However, ontologies might not be correct and, in some cases, might not be available. In general, ontologies are created manually by domain experts in collaboration with ontology engineers, which is a costly and error-prone task. In this study, we present a proposal to automatically generate ontologies from RDF datasets. We use summarization techniques to reduce triples and retain the most relevant ones. Subsequently, classes, datatype properties, object properties, as well as the domain and range of properties are identified for schema construction. In addition, an enrichment of the schema is performed by incorporating Object Property axioms. The result is the delivery of a serialized ontology document in OWL/XML format. Furthermore, we present an experimental evaluation of an RDF dataset of 16005 triples. Through application of our summarization technique the original dataset was decreased by 98%. The ontology was generated with a time of 148.73 seconds. Finally, 9 classes, 10 Object Properties, 6 Datatype Properties, and 4 different types of Object Property axioms were identified. |