Learning Chinese Word Segmentation Based on Bidirectional GRU-CRF and CNN Network Model

Autor: Chenghai Yu, Shu-pei Wang, Jia-jun Guo
Rok vydání: 2019
Předmět:
Zdroj: International Journal of Technology and Human Interaction. 15:47-62
ISSN: 1548-3916
1548-3908
DOI: 10.4018/ijthi.2019070104
Popis: Chinese word segmentation is the basis of the Chinese natural language processing (NLP). With the development of the deep learning, various neural network models are applied to the Chinese word segmentation. However, current neural network models have the characteristics of artificial feature extraction, nonstandard word-weight, inability to effectively use long-distance information and long training time of models in Chinese word segmentation. To solve a series of problems, this article presents a CNN-Bidirectional GRU-CRF neural network model (CNN Bidirectional GRU CRF Network, CBiGCN), which breaks through the limit of conventional method window, truly realizes end-to-end processing and applies to the neural network model by the five-Tag set method, bias-variable-weight greedy strategy and supplements by Goldstein-Armijo guidelines. Besides, this model, with simple structure, is easy to be operated. And it can automatically learn features, reduces large amounts of tasks on specific knowledge in the form of handcrafted features and data pre-processing, makes use of context information effectively. The authors set an experiment with two data corpuses for Chinese word segmentation to evaluate their system. The experiment verified their new model can obtain better Chinese word segmentation results and greatly reduce training time.
Databáze: OpenAIRE