Towards the automatic extraction of definitions in Slavic

Autor: Beata Wójtowicz, Petya Osenova, Vladislav Kuboň, Adam Przepiórkowski, Kiril Simov, Lukasz Degórski, Miroslav Spousta, Lothar Lemnitzer
Rok vydání: 2007
Předmět:
Zdroj: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing Information Extraction and Enabling Technologies - ACL '07.
Popis: This paper presents the results of the preliminary experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Bulgarian, Czech and Polish. The extraction is performed by regular grammars over XML-encoded morphosyntactically-annotated documents. The results are less than satisfying and we claim that the reason for that is the intrinsic difficulty of the task, as measured by the low interannotator agreement, which calls for more sophisticated deeper linguistic processing, as well as for the use of machine learning classification techniques.
Databáze: OpenAIRE