A systematic study of knowledge graph analysis for cross-language plagiarism detection
Autor: | Manuel Montes-y-Gómez, Paolo Rosso, Marc Franco-Salvador |
---|---|
Rok vydání: | 2016 |
Předmět: |
Scheme (programming language)
Vocabulary Multilingual semantic network Computer science media_common.quotation_subject 02 engineering and technology Distributed representations Library and Information Sciences Management Science and Operations Research computer.software_genre 020204 information systems Component (UML) Knowledge graphs 0202 electrical engineering electronic engineering information engineering Media Technology Relevance (information retrieval) Plagiarism detection Evaluation Representation (mathematics) computer.programming_language media_common Information retrieval business.industry Knowledge economy Computer Science Applications Weighting 020201 artificial intelligence & image processing Artificial intelligence business LENGUAJES Y SISTEMAS INFORMATICOS computer Natural language processing Cross-language Information Systems |
Zdroj: | RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia instname |
ISSN: | 0306-4573 |
DOI: | 10.1016/j.ipm.2015.12.004 |
Popis: | Cross-language plagiarism detection aims to detect plagiarised fragments of text among documents in different languages. In this paper, we perform a systematic examination of Cross-language Knowledge Graph Analysis; an approach that represents text fragments using knowledge graphs as a language independent content model. We analyse the contributions to cross-language plagiarism detection of the different aspects covered by knowledge graphs: word sense disambiguation, vocabulary expansion, and representation by similarities with a collection of concepts. In addition, we study both the relevance of concepts and their relations when detecting plagiarism. Finally, as a key component of the knowledge graph construction, we present a new weighting scheme of relations between concepts based on distributed representations of concepts. Experimental results in Spanish–English and German–English plagiarism detection show state-of-the-art performance and provide interesting insights on the use of knowledge graphs. © 2015 Elsevier Ltd. All rights reserved. This research has been carried out in the framework of the European Commission WIQ-EI IRSES (No. 269180) and DIANA APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) projects. We would like to thank Tomas Mikolov, Martin Potthast, and Luis A. Leiva for their support and comments during this research. |
Databáze: | OpenAIRE |
Externí odkaz: |