Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks

Autor:	Leonid L. Iomdin
Rok vydání:	2017
Předmět:	Linguistics and Language syntactic idioms microsyntactic dictionary Computer science business.industry Russian syntactically tagged corpus SynTagRus computer.software_genre Language and Linguistics Text corpora lcsh:Philology. Linguistics Annotation lcsh:P1-1091 microsyntactic annotation Artificial intelligence Computational linguistics business computer Natural language processing
Zdroj:	Jazykovedný Časopis, Vol 68, Iss 2, Pp 169-178 (2017)
ISSN:	1338-4287 0021-5597
DOI:	10.1515/jazcas-2017-0027
Popis:	Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0b6c1ad96692585e619ff4cbf07a5e4e https://doi.org/10.1515/jazcas-2017-0027 Zobrazit plný text záznamu Plný text ve formátu PDF