Incorporating Word Attention into Character-Based Word Segmentation
Autor: | Masao Utiyama, Eiichiro Sumita, Shohei Higashiyama, Yoshiaki Oida, Masao Ideuchi, Isaac Okada, Yohei Sakamoto |
---|---|
Rok vydání: | 2019 |
Předmět: |
Feature engineering
Artificial neural network Computer science business.industry Text segmentation Inference 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences 0202 electrical engineering electronic engineering information engineering Leverage (statistics) 020201 artificial intelligence & image processing Segmentation Artificial intelligence business computer 0105 earth and related environmental sciences |
Zdroj: | NAACL-HLT (1) |
DOI: | 10.18653/v1/n19-1276 |
Popis: | Neural network models have been actively applied to word segmentation, especially Chinese, because of the ability to minimize the effort in feature engineering. Typical segmentation models are categorized as character-based, for conducting exact inference, or word-based, for utilizing word-level information. We propose a character-based model utilizing word information to leverage the advantages of both types of models. Our model learns the importance of multiple candidate words for a character on the basis of an attention mechanism, and makes use of it for segmentation decisions. The experimental results show that our model achieves better performance than the state-of-the-art models on both Japanese and Chinese benchmark datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |