A systematic study of knowledge graph analysis for cross-language plagiarism detection

Autor: Manuel Montes-y-Gómez, Paolo Rosso, Marc Franco-Salvador
Rok vydání: 2016
Předmět:
Scheme (programming language)
Vocabulary
Multilingual semantic network
Computer science
media_common.quotation_subject
02 engineering and technology
Distributed representations
Library and Information Sciences
Management Science and Operations Research
computer.software_genre
020204 information systems
Component (UML)
Knowledge graphs
0202 electrical engineering
electronic engineering
information engineering

Media Technology
Relevance (information retrieval)
Plagiarism detection
Evaluation
Representation (mathematics)
computer.programming_language
media_common
Information retrieval
business.industry
Knowledge economy
Computer Science Applications
Weighting
020201 artificial intelligence & image processing
Artificial intelligence
business
LENGUAJES Y SISTEMAS INFORMATICOS
computer
Natural language processing
Cross-language
Information Systems
Zdroj: RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
instname
ISSN: 0306-4573
DOI: 10.1016/j.ipm.2015.12.004
Popis: Cross-language plagiarism detection aims to detect plagiarised fragments of text among documents in different languages. In this paper, we perform a systematic examination of Cross-language Knowledge Graph Analysis; an approach that represents text fragments using knowledge graphs as a language independent content model. We analyse the contributions to cross-language plagiarism detection of the different aspects covered by knowledge graphs: word sense disambiguation, vocabulary expansion, and representation by similarities with a collection of concepts. In addition, we study both the relevance of concepts and their relations when detecting plagiarism. Finally, as a key component of the knowledge graph construction, we present a new weighting scheme of relations between concepts based on distributed representations of concepts. Experimental results in Spanish–English and German–English plagiarism detection show state-of-the-art performance and provide interesting insights on the use of knowledge graphs. © 2015 Elsevier Ltd. All rights reserved.
This research has been carried out in the framework of the European Commission WIQ-EI IRSES (No. 269180) and DIANA APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) projects. We would like to thank Tomas Mikolov, Martin Potthast, and Luis A. Leiva for their support and comments during this research.
Databáze: OpenAIRE