Domain-Specific Chinese Word Segmentation Based on Bi-Directional Long-Short Term Memory Model
Autor: | Yan Xiang, Dangguo Shao, Zhaoqiang Yang, Na Zheng, Zhengtao Yu, Zhenhua Chen, Yantuan Xian |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
General Computer Science
Computer science Inference 0102 computer and information sciences 02 engineering and technology 01 natural sciences Field (computer science) 0202 electrical engineering electronic engineering information engineering General Materials Science Segmentation domain-specific Artificial neural network business.industry Deep learning Text segmentation General Engineering Pattern recognition combination of weight Bi-directional long-short term memory (Bi-directional LSTM) model 010201 computation theory & mathematics Domain knowledge 020201 artificial intelligence & image processing Artificial intelligence lcsh:Electrical engineering. Electronics. Nuclear engineering business Chinese word segmentation lcsh:TK1-9971 Word (computer architecture) |
Zdroj: | IEEE Access, Vol 7, Pp 12993-13002 (2019) |
ISSN: | 2169-3536 |
Popis: | Most of the current word segmentation methods are rule-based and traditional machine learning methods. Universal word segmentation tools do not work well in the field such as metallurgy. Domain-specific Chinese word segmentation is rarely studied. In recent years, with the development of deep learning, the neural network has been proved to be effective in Chinese word segmentation. However, this promising performance relies on large-scale training data. Neural networks with conventional architectures cannot achieve the desired results in low-resource datasets due to the lack of labeled training data. This paper takes the field of metallurgy as an example and proposes a domain-specific Chinese word segmentation based on Bi-directional long-short term memory (Bi-directional LSTM) model in the metallurgical field. First, the word segmentation model is obtained by using the Bi-directional LSTM model to train the internal and external domain knowledge. Then, a series of tuning parameters are carried out and the label probability of the word is combined with the weight. Finally, the result of word segmentation is obtained by label inference layer. The experimental results show that the proposed method can create a better word segmentation effect in the field of metallurgy. |
Databáze: | OpenAIRE |
Externí odkaz: |