Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks
Autor: | Leonid L. Iomdin |
---|---|
Rok vydání: | 2017 |
Předmět: |
Linguistics and Language
syntactic idioms microsyntactic dictionary Computer science business.industry Russian syntactically tagged corpus SynTagRus computer.software_genre Language and Linguistics Text corpora lcsh:Philology. Linguistics Annotation lcsh:P1-1091 microsyntactic annotation Artificial intelligence Computational linguistics business computer Natural language processing |
Zdroj: | Jazykovedný Časopis, Vol 68, Iss 2, Pp 169-178 (2017) |
ISSN: | 1338-4287 0021-5597 |
DOI: | 10.1515/jazcas-2017-0027 |
Popis: | Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus. |
Databáze: | OpenAIRE |
Externí odkaz: |