Incorporating Word Attention into Character-Based Word Segmentation

Autor: Masao Utiyama, Eiichiro Sumita, Shohei Higashiyama, Yoshiaki Oida, Masao Ideuchi, Isaac Okada, Yohei Sakamoto
Rok vydání: 2019
Předmět:
Zdroj: NAACL-HLT (1)
DOI: 10.18653/v1/n19-1276
Popis: Neural network models have been actively applied to word segmentation, especially Chinese, because of the ability to minimize the effort in feature engineering. Typical segmentation models are categorized as character-based, for conducting exact inference, or word-based, for utilizing word-level information. We propose a character-based model utilizing word information to leverage the advantages of both types of models. Our model learns the importance of multiple candidate words for a character on the basis of an attention mechanism, and makes use of it for segmentation decisions. The experimental results show that our model achieves better performance than the state-of-the-art models on both Japanese and Chinese benchmark datasets.
Databáze: OpenAIRE