Pooled Contextualized Embeddings for Named Entity Recognition
Autor: | Roland Vollgraf, Alan Akbik, Tanja Bergmann |
---|---|
Rok vydání: | 2019 |
Předmět: |
Word embedding
Computer science business.industry String (computer science) Context (language use) 02 engineering and technology computer.software_genre Sequence labeling Character (mathematics) Named-entity recognition 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence Language model business computer Natural language processing |
Zdroj: | NAACL-HLT (1) |
DOI: | 10.18653/v1/n19-1078 |
Popis: | Contextual string embeddings are a recent type of contextualized word embedding that were shown to yield state-of-the-art results when utilized in a range of sequence labeling tasks. They are based on character-level language models which treat text as distributions over characters and are capable of generating embeddings for any string of characters within any textual context. However, such purely character-based approaches struggle to produce meaningful embeddings if a rare string is used in a underspecified context. To address this drawback, we propose a method in which we dynamically aggregate contextualized embeddings of each unique string that we encounter. We then use a pooling operation to distill a ”global” word representation from all contextualized instances. We evaluate these ”pooled contextualized embeddings” on common named entity recognition (NER) tasks such as CoNLL-03 and WNUT and show that our approach significantly improves the state-of-the-art for NER. We make all code and pre-trained models available to the research community for use and reproduction. |
Databáze: | OpenAIRE |
Externí odkaz: |