Domänanpassning för identifiering av hypernymer via automatisk insamling av domänspecifikt träningsdata

Autor: Palm Myllylä, Johannes
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Popis: Identifying semantic relations in natural language text is an important component of many knowledge extraction systems. This thesis studies the task of hypernym discovery, i.e discovering terms that are related by the hypernymy (is-a) relation. Specifically, this thesis explores how state-of-the-art methods for hypernym discovery perform when applied in specific language domains. In recent times, state-of-the-art methods for hypernym discovery are mostly made up by supervised machine learning models that leverage distributional word representations such as word embeddings. These models require labeled training data in the form of term pairs that are known to be related by hypernymy. Such labeled training data is often not available when working with a specific language domain. This thesis presents experiments with an automatic training data collection algorithm. The algorithm leverages a pre-defined domain-specific vocabulary, and the lexical resource WordNet, to extract training pairs automatically. This thesis contributes by presenting experimental results when attempting to leverage such automatically collected domain-specific training data for the purpose of domain adaptation. Experiments are conducted in two different domains: One domain where there is a large amount of text data, and another domain where there is a much smaller amount of text data. Results show that the automatically collected training data has a positive impact on performance in both domains. The performance boost is most significant in the domain with a large amount of text data, with mean average precision increasing by up to 8 points.
Databáze: OpenAIRE