LDA-based term profiles for expert finding in a political setting
Autor: | Luis Redondo-Expósito, Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete |
---|---|
Rok vydání: | 2021 |
Předmět: |
Information retrieval
Computer Networks and Communications Computer science Expert finding Dice 02 engineering and technology Recommender system Latent Dirichlet allocation Field (computer science) Term (time) Task (project management) symbols.namesake Artificial Intelligence Hardware and Architecture 020204 information systems Similarity (psychology) 0202 electrical engineering electronic engineering information engineering symbols User profiles Latent Dirichlet Allocation Set (psychology) Software Information Systems |
Zdroj: | Journal of Intelligent Information Systems. 56:529-559 |
ISSN: | 1573-7675 0925-9902 |
Popis: | A common task in many political institutions (i.e. Parliament) is to find politicians who are experts in a particular field. In order to tackle this problem, the first step is to obtain politician profiles which include their interests, and these can be automatically learned from their speeches. As a politician may have various areas of expertise, one alternative is to use a set of subprofiles, each of which covers a different subject. In this study, we propose a novel approach for this task by using latent Dirichlet allocation (LDA) to determine the main underlying topics of each political speech, and to distribute the related terms among the different topic-based subprofiles. With this objective, we propose the use of fifteen distance and similarity measures to automatically determine the optimal number of topics discussed in a document, and to demonstrate that every measure converges into five strategies: Euclidean, Dice, Sorensen, Cosine and Overlap. Our experimental results showed that the scores of the different accuracy metrics of the proposed strategies tended to be higher than those of the baselines for expert recommendation tasks, and that the use of an appropriate number of topics has proved relevant. This work has been funded by the Spanish Ministerio de Economı́a y Competitividad under projects TIN2016-77902-C3-2-P and PID2019-106758GB-C31, and the European Regional Development Fund (ERDF-FEDER). |
Databáze: | OpenAIRE |
Externí odkaz: |