Representation Learning with LDA Models for Entity Disambiguation in Specific Domains

Autor: Yantuan Xian, Hongbin Wang, Huaqin Li, Zhiju Zhang, Shengchen Jiang
Rok vydání: 2021
Předmět:
Zdroj: Journal of Advanced Computational Intelligence and Intelligent Informatics. 25:326-334
ISSN: 1883-8014
1343-0130
DOI: 10.20965/jaciii.2021.p0326
Popis: Entity disambiguation is extremely important in knowledge construction. The word representation model ignores the influence of the ordering between words on the sentence or text information. Thus, we propose a domain entity disambiguation method that fuses the doc2vec and LDA topic models. In this study, the doc2vec document is used to indicate that the model obtains the vector form of the entity reference item and the candidate entity from the domain corpus and knowledge base, respectively. Moreover, the context similarity and category referential similarity calculations are performed based on the knowledge base of the upper and lower relation domains that are constructed. The LDA topic model and doc2vec model are used to obtain word expressions with different meanings of polysemic words. We use the k-means algorithm to cluster the word vectors under different topics to obtain the topic domain keywords of the text, and perform the similarity calculations under the domain keywords of the different topics. Finally, the similarities of the three feature types are merged and the candidate entity with the highest similarity degree is used as the final target entity. The experimental results demonstrate that the proposed method outperforms the existing model, which proves its feasibility and effectiveness.
Databáze: OpenAIRE