Discovering Meta-Paths in Large Heterogeneous Information Networks
Autor: | Silviu Maniu, Changping Meng, Reynold Cheng, Wangda Zhang, Pierre Senellart |
---|---|
Přispěvatelé: | Télécom Paristech, Admin, Department of Computer Science [Hong Kong], City University of Hong Kong [Hong Kong] (CUHK), Data, Intelligence and Graphs (DIG), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Informatique et Réseaux (INFRES), Télécom ParisTech |
Rok vydání: | 2015 |
Předmět: |
Class (computer programming)
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] Theoretical computer science [INFO.INFO-WB] Computer Science [cs]/Web Computer science [INFO.INFO-WB]Computer Science [cs]/Web 02 engineering and technology Data structure computer.software_genre Domain (software engineering) 020204 information systems Node (computer science) Scalability 0202 electrical engineering electronic engineering information engineering [INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB] 020201 artificial intelligence & image processing Enhanced Data Rates for GSM Evolution Data mining Greedy algorithm computer Heterogeneous network |
Zdroj: | WWW WWW, May 2015, Florence, Italy |
Popis: | International audience; The Heterogeneous Information Network (HIN) is a graph data model in which nodes and edges are annotated with class and relationship labels. Large and complex datasets, such as Yago or DBLP, can be modeled as HINs. Recent work has studied how to make use of these rich information sources. In particular, meta-paths, which represent sequences of node classes and edge types between two nodes in a HIN, have been proposed for such tasks as information retrieval, decision making, and product recommendation. Current methods assume meta-paths are found by domain experts. However, in a large and complex HIN, retrieving meta-paths manually can be tedious and difficult. We thus study how to discover meta-paths automatically. Specifically, users are asked to provide example pairs of nodes that exhibit high proximity. We then investigate how to generate meta-paths that can best explain the relationship between these node pairs. Since this problem is computationally intractable, we propose a greedy algorithm to select the most relevant meta-paths. We also present a data structure to enable efficient execution of this algorithm. We further incorporate hierarchical relationships among node classes in our solutions. Extensive experiments on real-world HIN show that our approach captures important meta-paths in an efficient and scalable manner. |
Databáze: | OpenAIRE |
Externí odkaz: |