Diamonds in the Rough: Event Extraction from Imperfect Microblog Data
Autor: | Oier Lopez de Lacalle, Mihai Surdeanu, Eneko Agirre, Ander Intxaurrondo |
---|---|
Rok vydání: | 2015 |
Předmět: | |
Zdroj: | HLT-NAACL Scopus-Elsevier |
DOI: | 10.3115/v1/n15-1066 |
Popis: | We introduce a distantly supervised event extraction approach that extracts complex event templates from microblogs. We show that this near real-time data source is more challenging than news because it contains information that is both approximate (e.g., with values that are close but different from the gold truth) and ambiguous (due to the brevity of the texts), impacting both the evaluation and extraction methods. For the former, we propose a novel, “soft”, F1 metric that incorporates similarity between extracted fillers and the gold truth, giving partial credit to different but similar values. With respect to extraction methodology, we propose two extensions to the distant supervision paradigm: to address approximate information, we allow positive training examples to be generated from information that is similar but not identical to gold values; to address ambiguity, we aggregate contexts across tweets discussing the same event. We evaluate our contributions on the complex domain of earthquakes, with events with up to 20 arguments. Our results indicate that, despite their simplicity, our contributions yield a statistically-significant improvement of 33% (relative) over a strong distantly-supervised system. The dataset containing the knowledge base, relevant tweets and manual annotations is publicly available. |
Databáze: | OpenAIRE |
Externí odkaz: |