Entity Hierarchical Clustering Method Based on Multi-channel and T-SNE Dimension Reduction

Autor: Li Duan, Sijie Liu, Haojun Feng, Shukan Liu
Rok vydání: 2020
Předmět:
Zdroj: 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC).
Popis: Named entity clustering is a basic work in the field of natural language processing, which is helpful to excavate the implicit relationship between entities. Most of the existing clustering algorithms are unable to combine various features of entities and have some problems such as poor hierarchical clustering analysis. Based on this, this paper proposes a multi-channel dimensionless entity clustering method and carries out experimental verification. A multi-channel framework is constructed, and channels based on knowledge graph, language model and statistics are respectively set up to express the features of entities in objective knowledge, co-existing relationship and different texts. Network embedding method, BERT model and automatic coding machine are respectively used to convert the entities into vector forms. The t-SNE algorithm is used to reduce the dimension and map it to the two-dimensional space, so that it can be expressed visually in the low-dimensional space. An improved hierarchical clustering method is proposed to cluster entities in two-dimensional space and construct hierarchical clustering trees. Experiments show that the F1 value of this algorithm can reach 78.72% at most under the test set. At the same time, through analysis, the algorithm has a strong ability of expansion and generalization.
Databáze: OpenAIRE