Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields
Autor: | Georgiev, G., Preslav Nakov, Ganchev, K., Osenova, P., Simov, K. |
---|---|
Rok vydání: | 2021 |
Předmět: | |
Zdroj: | Scopus-Elsevier |
DOI: | 10.48550/arxiv.2109.15121 |
Popis: | The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F1=89.4%, which is comparable to the state-of-the-art results for English. Comment: named entity recognition, NER, conditional random fields, CRF, Bulgarian, BulTreeBank |
Databáze: | OpenAIRE |
Externí odkaz: |