CzeDLex – A Lexicon of Czech Discourse Connectives
Autor: | Lucie Poláková, Magdaléna Rysová, Jiří Mírovský, Pavlína Synková |
---|---|
Rok vydání: | 2017 |
Předmět: |
060201 languages & linguistics
Czech business.industry 06 humanities and the arts 02 engineering and technology Lexicon computer.software_genre Discourse connectives language.human_language Linguistics 0602 languages and literature Computational linguistics. Natural language processing 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Artificial intelligence P98-98.5 business computer Natural language processing |
Zdroj: | Prague Bulletin of Mathematical Linguistics, Vol 109, Iss 1, Pp 61-91 (2017) |
ISSN: | 1804-0462 |
DOI: | 10.1515/pralin-2017-0039 |
Popis: | CzeDLex is a new electronic lexicon of Czech discourse connectives, planned for publication by the end of this year. Its data format and structure are based on a study of similar existing resources, and adjusted to comply with the Czech syntactic tradition and specifics and with the Prague approach to the annotation of semantic discourse relations in text. In the article, we first put the lexicon in context of related resources and discuss theoretical aspects of building the lexicon – we present arguments for our choice of the data structure and for selecting features of the lexicon entries, while special attention is paid to a consistent and (as far as possible) uniform encoding of both primary (such as in English because, therefore) and secondary connectives (e.g. for this reason, this is the reason why). The main principle adopted for nesting entries in the lexicon is – apart from the lexical form of the connective – a discoursesemantic type (sense) expressed by the given connective, which enables us to deal with a broad formal variability of connectives and is convenient for interlinking CzeDLex with lexicons in other languages. Second, we introduce the chosen technical solution based on the Prague Markup Language, which allows for an efficient incorporation of the lexicon into the family of Prague treebanks – it can be directly opened and edited in the tree editor TrEd, processed from the command line in btred, interlinked with its source corpus and queried in the PML Tree Query engine. Third, we describe the process of getting data for the lexicon by exploiting a large corpus manually annotated with discourse relations – the Prague Discourse Treebank 2.0: we elaborate on the automatic extraction part, post-extraction checks and manual addition of supplementary linguistic information. |
Databáze: | OpenAIRE |
Externí odkaz: |