Learning Chinese Word Segmentation Based on Bidirectional GRU-CRF and CNN Network Model
Autor: | Chenghai Yu, Shu-pei Wang, Jia-jun Guo |
---|---|
Rok vydání: | 2019 |
Předmět: |
Structure (mathematical logic)
Artificial neural network business.industry Computer science Deep learning Feature extraction Context (language use) 02 engineering and technology Machine learning computer.software_genre Human-Computer Interaction Set (abstract data type) 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Segmentation Artificial intelligence business computer Information Systems Network model |
Zdroj: | International Journal of Technology and Human Interaction. 15:47-62 |
ISSN: | 1548-3916 1548-3908 |
DOI: | 10.4018/ijthi.2019070104 |
Popis: | Chinese word segmentation is the basis of the Chinese natural language processing (NLP). With the development of the deep learning, various neural network models are applied to the Chinese word segmentation. However, current neural network models have the characteristics of artificial feature extraction, nonstandard word-weight, inability to effectively use long-distance information and long training time of models in Chinese word segmentation. To solve a series of problems, this article presents a CNN-Bidirectional GRU-CRF neural network model (CNN Bidirectional GRU CRF Network, CBiGCN), which breaks through the limit of conventional method window, truly realizes end-to-end processing and applies to the neural network model by the five-Tag set method, bias-variable-weight greedy strategy and supplements by Goldstein-Armijo guidelines. Besides, this model, with simple structure, is easy to be operated. And it can automatically learn features, reduces large amounts of tasks on specific knowledge in the form of handcrafted features and data pre-processing, makes use of context information effectively. The authors set an experiment with two data corpuses for Chinese word segmentation to evaluate their system. The experiment verified their new model can obtain better Chinese word segmentation results and greatly reduce training time. |
Databáze: | OpenAIRE |
Externí odkaz: |