Semantic similarity computation and word sense induction using hidden sets multidimensional scaling

Autor: Athanasopoulou Georgia
Přispěvatelé: Ποταμιανος Αλεξανδρος, Potamianos Alexandros, Κουτσακης Πολυχρονης, Koutsakis Polychronis, Λιαβας Αθανασιος, Liavas Athanasios, Επιβλέπων: Κουτσακης Πολυχρονης, Advisor: Koutsakis Polychronis, Μέλος επιτροπής: Λιαβας Αθανασιος, Committee member: Liavas Athanasios
Jazyk: angličtina
Předmět:
Popis: Summarization: In this thesis, motivated by evidences in psycholinguistics and cognition, we propose an unsupervised language-agnostic Distributional Semantic Model (DSM), that utilize web harvested data, for the problem of semantic similarity estimation. Semantic similarity can be applied to numerous tasks of Natural Language Processing (NLP), such as affective text analysis and paraphrasing. In the first part of the thesis, the construction of typical DSMs following the well-established Vector Space Model, is presented. More specifically, we describe the creation of corpora by harvesting web documents following a query-based approach, as well as state-of-the-art DSMs used for the computation of semantic similarity from the corpora. Next, we propose a novel hierarchical distributed semantic model (DSM), that is inspired by evidence in psycholinguistics and cognition, and consists of low-dimensional manifolds built on semantic neighborhoods. Each manifold is sparsely encoded and mapped into a low-dimensional space. Global operations are decomposed into local operations in multiple sub-spaces; results from these local operations are fused to come up with semantic relatedness estimates. Manifold DSM are constructed starting from a pairwise word-level semantic similarity matrix. The proposed model is evaluated against state-of-the-art/baseline DSMs on semantic similarity estimation task, where the similarity metrics are evaluated against human similarity ratings. The proposed model significantly improve performance comparing to the baseline approaches for the task of semantic similarity estimation between words. Furthermore the proposed model is evaluated in a taxonomy task achieving achieving state-of-the-art results. Finally, motivated by evidence of cognitive organization of concepts based on the degree of concreteness, we present the performance of proposed DSM for abstract and concrete nouns.
Databáze: OpenAIRE