Robust Neural Relation Extraction via Multi-Granularity Noises Reduction
Autor: | Tianyi Liu, Pengshuai Li, Hai Zhao, Weijia Jia, Xinsong Zhang |
---|---|
Rok vydání: | 2021 |
Předmět: |
Artificial neural network
Relation (database) Computer science business.industry Feature extraction Pattern recognition 02 engineering and technology Relationship extraction Computer Science Applications Computational Theory and Mathematics 020204 information systems 0202 electrical engineering electronic engineering information engineering Noise (video) Artificial intelligence Transfer of learning business Sentence Information Systems |
Zdroj: | IEEE Transactions on Knowledge and Data Engineering. 33:3297-3310 |
ISSN: | 2326-3865 1041-4347 |
DOI: | 10.1109/tkde.2020.2964747 |
Popis: | Distant supervision is widely used to extract relational facts with automatically labeled datasets to reduce high cost of human annotation. However, current distantly supervised methods suffer from the common problems of word-level and sentence-level noises, which come from a large proportion of irrelevant words in a sentence and inaccurate relation labels for numerous sentences. The problems lead to unacceptable precision in relation extraction and are critical for the success of using distant supervision. In this paper, we propose a novel and robust neural approach to deal with both problems by reducing influences of the multi-granularity noises. Three levels of noises from word, sentence until knowledge type are carefully considered in this work. We first initiate a question-answering based relation extractor (QARE) to remove noisy words in a sentence. Then we use multi-focus multi-instance learning (MMIL) to alleviate the effects of sentence-level noise by utilizing wrongly labeled sentences properly. Finally, to enhance our method against all the noises, we initialize parameters in our method with a priori knowledge learned from the relevant task of entity type classification by transfer learning. Extensive experiments on both existing benchmark and an improved larger dataset demonstrate that our proposed approach remarkably achieves new state-of-the-art performance. |
Databáze: | OpenAIRE |
Externí odkaz: |