Mining Relations from Unstructured Content
Autor: | Steve Welch, Anna Lisa Gentile, Daniel Gruhl, Ismini Lourentzou, Anni R. Coden, Alfredo Alba |
---|---|
Rok vydání: | 2018 |
Předmět: |
Relation (database)
business.industry Computer science Active learning (machine learning) Process (engineering) media_common.quotation_subject 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Relationship extraction Task (project management) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Quality (business) Artificial intelligence Web content business computer 0105 earth and related environmental sciences Test data media_common |
Zdroj: | Advances in Knowledge Discovery and Data Mining ISBN: 9783319930367 PAKDD (2) |
Popis: | Extracting relations from unstructured Web content is a challenging task and for any new relation a significant effort is required to design, train and tune the extraction models. In this work, we investigate how to obtain suitable results for relation extraction with modest human efforts, relying on a dynamic active learning approach. We propose a method to reliably generate high quality training/test data for relation extraction - for any generic user-demonstrated relation, starting from a few user provided examples and extracting valuable samples from unstructured and unlabeled Web content. To this extent we propose a strategy which learns how to identify the best order to human-annotate data, maximizing learning performance early in the process. We demonstrate the viability of the approach (i) against state of the art datasets for relation extraction as well as (ii) a real case study identifying text expressing a causal relation between a drug and an adverse reaction from user generated Web content. |
Databáze: | OpenAIRE |
Externí odkaz: |