Automated assessment of knowledge hierarchy evolution: comparing directed acyclic graphs

Autor: Guruprasad Nayak, Patrick K. Nicholson, Alessandra Sala, Sourav Dutta, Deepak Ajwani
Rok vydání: 2018
Předmět:
Zdroj: Information Retrieval Journal. 22:256-284
ISSN: 1573-7659
1386-4564
DOI: 10.1007/s10791-018-9345-y
Popis: Automated construction of knowledge hierarchies from huge data corpora is gaining increasing attention in recent years, in order to tackle the infeasibility of manually extracting and semantically linking millions of concepts. As a knowledge hierarchy evolves with these automated techniques, there is a need for measures to assess its temporal evolution, quantifying the similarities between different versions and identifying the relative growth of different subgraphs in the knowledge hierarchy. In this paper, we focus on measures that leverage structural properties of the knowledge hierarchy graph to assess the temporal changes. We propose a principled and scalable similarity measure, based on Katz similarity between concept nodes, for comparing different versions of a knowledge hierarchy, modeled as a generic directed acyclic graph. We present theoretical analysis to depict that the proposed measure accurately captures the salient properties of taxonomic hierarchies, assesses changes in the ordering of nodes, along with the logical subsumption of relationships among concepts. We also present a linear time variant of the measure, and show that our measures, unlike previous approaches, are tunable to cater to diverse application needs. We further show that our measure provides interpretability, thereby identifying the key structural and logical difference in the hierarchies. Experiments on a real DBpedia and biological knowledge hierarchy showcase that our measures accurately capture structural similarity, while providing enhanced scalability and tunability. Also, we demonstrate that the temporal evolution of different subgraphs in this knowledge hierarchy, as captured purely by our structural measure, corresponds well with the known disruptions in the related subject areas.
Databáze: OpenAIRE