Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings

Autor:	Chengyang Zhuang, Yuanjie Zheng, Wenhui Huang, Weikuan Jia
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	Chinese word embedding stroke sub-character character language n–grams Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 7, Pp 174699-174708 (2019)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2019.2956822
Popis:	The most common method of word embedding is to learn word vector representations from context information of large-scale text. However, Chinese words usually consist of characters, subcharacters, and strokes, and each part contains rich semantic information. The quality of Chinese word vectors is related to the accuracy of prediction. Therefore, to obtain high-quality Chinese character embedding, we propose a continuously enhanced word embedding model. The model starts with fine-grained strokes and adjacent stroke information and enhances subcharacter embedding by combining the relationship vector representation between strokes. Similarly, we combine the subcharacter relationship vector and the character relationship vector to learn Chinese word embedding based on the enhanced subcharacter embedding. We construct the underlying stroke n-grams and adjacent stroke n-grams and extract the relationship vector used to enhance the relationship between the components, which can be used to learn Chinese word embedding and improve the accuracy. Finally, we evaluate our model on the word similarity calculations and word reasoning tasks.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/3dfae04e38a449ce95369359a5798dc2 Zobrazit plný text záznamu View record in DOAJ