Croatian Network Lexicon within the Syntactic and Semantic Framework and LLOD Cloud

Autor: Marko Orešković, Sandra Lovrenčić, Mario Essert
Rok vydání: 2018
Předmět:
Zdroj: International Journal of Lexicography. 32:207-227
ISSN: 1477-4577
0950-3846
DOI: 10.1093/ijl/ecy024
Popis: This paper presents a new type of network lexicon for the Croatian language based on a syntactic and semantic computational framework. It begins with an overview of the existing Croatian e-dictionaries and online repositories, as well as a brief outline of other relevant network ontological models. The network lexicon, which is based on an innovative approach to word tagging, is described in the remainder of the paper. Instead of presenting a linear (e.g. MULTEX- East) structure, this paper proposes a new hierarchical tree-like T-structure that is very similar to the structure of an ontology. In this approach, each word is processed on multiple levels: from its internal structure (morphs or syllables), via links to external network resources (encyclopaedias), to multiword expressions that can have distinctive roles, such as semantic domains, collocations and even figurative expressions. A network framework facilitates the fetching and filtering of the information related to the searched word in a paradigmatic sense because of the integration of the CroWN, the Croatian version of the English WordNet, and in a syntagmatic sense by building the database of the T-structure patterns from a selected corpus. Finally, the network framework enables the dynamic integration of the lexicon with the Linguistic Linked Open Data cloud ; thus, each change in the lexicon will be automatically reflected in the cloud. It is therefore not necessary to perform any periodical synchronisation of the data, a task that is quite common when working with triples stored in a Virtuoso database. Special attention has been paid to the technical components and the data preparation process, which are described in detail to serve as a guide for transforming existing lexicographic data into Linked Open Data triples.
Databáze: OpenAIRE