Towards the automatic extraction of definitions in Slavic
Autor: | Beata Wójtowicz, Petya Osenova, Vladislav Kuboň, Adam Przepiórkowski, Kiril Simov, Lukasz Degórski, Miroslav Spousta, Lothar Lemnitzer |
---|---|
Rok vydání: | 2007 |
Předmět: |
Czech
Deep linguistic processing Glossary Computer science business.industry media_common.quotation_subject computer.software_genre language.human_language Agreement Statistical classification Rule-based machine translation language Bulgarian Slavic languages Artificial intelligence business computer Natural language processing media_common |
Zdroj: | Proceedings of the Workshop on Balto-Slavonic Natural Language Processing Information Extraction and Enabling Technologies - ACL '07. |
Popis: | This paper presents the results of the preliminary experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Bulgarian, Czech and Polish. The extraction is performed by regular grammars over XML-encoded morphosyntactically-annotated documents. The results are less than satisfying and we claim that the reason for that is the intrinsic difficulty of the task, as measured by the low interannotator agreement, which calls for more sophisticated deeper linguistic processing, as well as for the use of machine learning classification techniques. |
Databáze: | OpenAIRE |
Externí odkaz: |