A Comparison of Lithuanian Morphological Analyzers
Autor: | Erika Rimkutė, Jurgita Kapočiūtė-Dzikienė, Loïc Boizou |
---|---|
Rok vydání: | 2017 |
Předmět: |
060201 languages & linguistics
Computer science business.industry Lemmatisation 05 social sciences 050401 social sciences methods 06 humanities and the arts Lithuanian computer.software_genre language.human_language Annotation 0504 sociology 0602 languages and literature language Artificial intelligence business computer Natural language processing Strengths and weaknesses |
Zdroj: | Text, Speech, and Dialogue ISBN: 9783319642055 TSD |
DOI: | 10.1007/978-3-319-64206-2_6 |
Popis: | In this paper we present the comparative research work disclosing strengths and weaknesses of two the most popular and publicly available Lithuanian morphological analyzers, in particular, Lemuoklis and Semantika.lt. Their lemmatization, part-of-speech tagging, and fined-grained annotation of the morphological categories (as case, gender, tense, etc.) performance was evaluated on the morphologically annotated gold standard corpus composed of four domains, in particular, administrative, fiction, scientific and periodical texts. Semantika.lt significantly outperformed Lemuoklis by \(\sim \)1.7%, \(\sim \)2.5%, and \(\sim \)8.1% on the lemmatization, part-of-speech tagging, and fine-grained annotation tasks achieving \(\sim \)98.0%, \(\sim \)95.3% and, \(\sim \)86.8% of the accuracy, respectively. |
Databáze: | OpenAIRE |
Externí odkaz: |