Computing inter-document similarity with Context Semantic Analysis
Autor: | Giovanni Simonini, Sonia Bergamaschi, Domenico Beneventano, Fabio Benedetti |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
Semantic analysis (machine learning) Inter-DocumentSimilarity Context (language use) 02 engineering and technology Task (project management) Domain (software engineering) KnowledgeBase KnowledgeGraph Inter-DocumentSimilarity Similarity Measures Information Retrieval 020204 information systems Similarity (psychology) 0202 electrical engineering electronic engineering information engineering RDF KnowledgeGraph Information retrieval business.industry computer.file_format Base (topology) Knowledge base Hardware and Architecture Information Retrieval Similarity Measures 020201 artificial intelligence & image processing KnowledgeBase business computer Software Information Systems |
Zdroj: | Information Systems. 80:136-147 |
ISSN: | 0306-4379 |
DOI: | 10.1016/j.is.2018.02.009 |
Popis: | We propose a novel knowledge-based technique for inter-document similarity computation, called Context Semantic Analysis (CSA). Several specialized approaches built on top of specific knowledge base (e.g. Wikipedia) exist in literature, but CSA differs from them because it is designed to be portable to any RDF knowledge base. In fact, our technique relies on a generic RDF knowledge base (e.g. DBpedia and Wikidata) to extract from it a Semantic Context Vector, a novel model for representing the context of a document, which is exploited by CSA to compute inter-document similarity effectively. Moreover, we show how CSA can be effectively applied in the Information Retrieval domain. Experimental results show that: (i) for the general task of inter-document similarity, CSA outperforms baselines built on top of traditional methods, and achieves a performance similar to the ones built on top of specific knowledge bases; (ii) for Information Retrieval tasks, enriching documents with context (i.e., employing the Semantic Context Vector model) improves the results quality of the state-of-the-art technique that employs such similar semantic enrichment. |
Databáze: | OpenAIRE |
Externí odkaz: |