A systematic study of knowledge graph analysis for cross-language plagiarism detection

Autor:	Manuel Montes-y-Gómez, Paolo Rosso, Marc Franco-Salvador
Rok vydání:	2016
Předmět:	Scheme (programming language) Vocabulary Multilingual semantic network Computer science media_common.quotation_subject 02 engineering and technology Distributed representations Library and Information Sciences Management Science and Operations Research computer.software_genre 020204 information systems Component (UML) Knowledge graphs 0202 electrical engineering electronic engineering information engineering Media Technology Relevance (information retrieval) Plagiarism detection Evaluation Representation (mathematics) computer.programming_language media_common Information retrieval business.industry Knowledge economy Computer Science Applications Weighting 020201 artificial intelligence & image processing Artificial intelligence business LENGUAJES Y SISTEMAS INFORMATICOS computer Natural language processing Cross-language Information Systems
Zdroj:	RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia instname
ISSN:	0306-4573
DOI:	10.1016/j.ipm.2015.12.004
Popis:	Cross-language plagiarism detection aims to detect plagiarised fragments of text among documents in different languages. In this paper, we perform a systematic examination of Cross-language Knowledge Graph Analysis; an approach that represents text fragments using knowledge graphs as a language independent content model. We analyse the contributions to cross-language plagiarism detection of the different aspects covered by knowledge graphs: word sense disambiguation, vocabulary expansion, and representation by similarities with a collection of concepts. In addition, we study both the relevance of concepts and their relations when detecting plagiarism. Finally, as a key component of the knowledge graph construction, we present a new weighting scheme of relations between concepts based on distributed representations of concepts. Experimental results in Spanish–English and German–English plagiarism detection show state-of-the-art performance and provide interesting insights on the use of knowledge graphs. © 2015 Elsevier Ltd. All rights reserved. This research has been carried out in the framework of the European Commission WIQ-EI IRSES (No. 269180) and DIANA APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) projects. We would like to thank Tomas Mikolov, Martin Potthast, and Luis A. Leiva for their support and comments during this research.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e2bfc2cc8f95938e8c7ec7a7bb480e6d https://doi.org/10.1016/j.ipm.2015.12.004 Zobrazit plný text záznamu Full Text from ScienceDirect