Representation Learning with LDA Models for Entity Disambiguation in Specific Domains

Autor:	Yantuan Xian, Hongbin Wang, Huaqin Li, Zhiju Zhang, Shengchen Jiang
Rok vydání:	2021
Předmět:	0209 industrial biotechnology Computer science business.industry k-means clustering 02 engineering and technology computer.software_genre Human-Computer Interaction 020901 industrial engineering & automation Artificial Intelligence 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Artificial intelligence business Feature learning computer Natural language processing
Zdroj:	Journal of Advanced Computational Intelligence and Intelligent Informatics. 25:326-334
ISSN:	1883-8014 1343-0130
DOI:	10.20965/jaciii.2021.p0326
Popis:	Entity disambiguation is extremely important in knowledge construction. The word representation model ignores the influence of the ordering between words on the sentence or text information. Thus, we propose a domain entity disambiguation method that fuses the doc2vec and LDA topic models. In this study, the doc2vec document is used to indicate that the model obtains the vector form of the entity reference item and the candidate entity from the domain corpus and knowledge base, respectively. Moreover, the context similarity and category referential similarity calculations are performed based on the knowledge base of the upper and lower relation domains that are constructed. The LDA topic model and doc2vec model are used to obtain word expressions with different meanings of polysemic words. We use the k-means algorithm to cluster the word vectors under different topics to obtain the topic domain keywords of the text, and perform the similarity calculations under the domain keywords of the different topics. Finally, the similarities of the three feature types are merged and the candidate entity with the highest similarity degree is used as the final target entity. The experimental results demonstrate that the proposed method outperforms the existing model, which proves its feasibility and effectiveness.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::dda094e6b9aaaf0683a520b9d1f22580 https://doi.org/10.20965/jaciii.2021.p0326 Zobrazit plný text záznamu