Simple negative sampling for link prediction in knowledge graphs

Autor: Md Kamrul Islam, Sabeur Aridhi, Malika Smail-Tabbone
Přispěvatelé: Computational Algorithms for Protein Structures and Interactions (CAPSID), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: Studies in Computational Intelligence
The 10th International Conference on Complex Networks and their Applications
The 10th International Conference on Complex Networks and their Applications, Nov 2021, Madrid, Spain. pp.549-562, ⟨10.1007/978-3-030-93413-2_46⟩
Complex Networks & Their Applications X ISBN: 9783030934125
DOI: 10.1007/978-3-030-93413-2_46⟩
Popis: International audience; Knowledge graph (KG) embedding methods learn the low dimensional vector representations of entities and relations of a knowledge graph, facilitating the link prediction task in knowledge graphs. During learning of embeddings, sampling negative triples is important because KGs have only observed positive triples. To the best of our knowledge, uniform-random, generative adversarial network(GAN)-based, and NSCaching, structure aware negative sampling(SANS) are four negative sampling methods in the literature. Unfortunately, they suffer from computational and memory inefficiency problems. In addition, their prediction performance are affected by the 'vanishing gradient' problem because of poor quality of sampled negative triples. In this paper, we propose a simple negative sampling (SNS) method based on the assumption that the entities which are closer in the embedding space to the corrupted entity are able to provide high-quality negative triples. Furthermore SNS has a good exploitation potential as it uses sampled highquality negatives for improving the quality of negative triples in next steps. We evaluate our sampling method through link prediction task on five well-known knowledge graph datasets, WN18, WN18RR, FB15K, FB15K-237, YAGO3-10. The method is also evaluated on a new biological KG dataset (FIGHT-HF-23R). Experimental results show that the SNS improves the prediction performance of KG embedding models, and outperforms the existing sampling methods.
Databáze: OpenAIRE