POS Tagging of Hungarian with Combined Statistical and Rule-Based Methods

Autor: András Hócza, János Csirik, András Kuba
Rok vydání: 2004
Předmět:
Zdroj: Text, Speech and Dialogue ISBN: 9783540230496
TSD
DOI: 10.1007/978-3-540-30120-2_15
Popis: In this paper we will survey the key results achieved so far in Hungarian POS tagging. The most successful approaches have been selected and re-evaluated on a manually annotated corpus containing 1.2 million words. Tests were performed on single-domain, multiple domain and cross-domain test settings. We investigate here the possibilities of further improvement of the selected POS tagging methods by combining them. Our aim is to build a POS tagger that achieves good results on a fine tag set of more than 1000 tags.
Databáze: OpenAIRE