Mining Relations from Unstructured Content

Autor: Steve Welch, Anna Lisa Gentile, Daniel Gruhl, Ismini Lourentzou, Anni R. Coden, Alfredo Alba
Rok vydání: 2018
Předmět:
Zdroj: Advances in Knowledge Discovery and Data Mining ISBN: 9783319930367
PAKDD (2)
Popis: Extracting relations from unstructured Web content is a challenging task and for any new relation a significant effort is required to design, train and tune the extraction models. In this work, we investigate how to obtain suitable results for relation extraction with modest human efforts, relying on a dynamic active learning approach. We propose a method to reliably generate high quality training/test data for relation extraction - for any generic user-demonstrated relation, starting from a few user provided examples and extracting valuable samples from unstructured and unlabeled Web content. To this extent we propose a strategy which learns how to identify the best order to human-annotate data, maximizing learning performance early in the process. We demonstrate the viability of the approach (i) against state of the art datasets for relation extraction as well as (ii) a real case study identifying text expressing a causal relation between a drug and an adverse reaction from user generated Web content.
Databáze: OpenAIRE