Representation Learning with LDA Models for Entity Disambiguation in Specific Domains
Autor: | Yantuan Xian, Hongbin Wang, Huaqin Li, Zhiju Zhang, Shengchen Jiang |
---|---|
Rok vydání: | 2021 |
Předmět: |
0209 industrial biotechnology
Computer science business.industry k-means clustering 02 engineering and technology computer.software_genre Human-Computer Interaction 020901 industrial engineering & automation Artificial Intelligence 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Artificial intelligence business Feature learning computer Natural language processing |
Zdroj: | Journal of Advanced Computational Intelligence and Intelligent Informatics. 25:326-334 |
ISSN: | 1883-8014 1343-0130 |
DOI: | 10.20965/jaciii.2021.p0326 |
Popis: | Entity disambiguation is extremely important in knowledge construction. The word representation model ignores the influence of the ordering between words on the sentence or text information. Thus, we propose a domain entity disambiguation method that fuses the doc2vec and LDA topic models. In this study, the doc2vec document is used to indicate that the model obtains the vector form of the entity reference item and the candidate entity from the domain corpus and knowledge base, respectively. Moreover, the context similarity and category referential similarity calculations are performed based on the knowledge base of the upper and lower relation domains that are constructed. The LDA topic model and doc2vec model are used to obtain word expressions with different meanings of polysemic words. We use the k-means algorithm to cluster the word vectors under different topics to obtain the topic domain keywords of the text, and perform the similarity calculations under the domain keywords of the different topics. Finally, the similarities of the three feature types are merged and the candidate entity with the highest similarity degree is used as the final target entity. The experimental results demonstrate that the proposed method outperforms the existing model, which proves its feasibility and effectiveness. |
Databáze: | OpenAIRE |
Externí odkaz: |