Recognizing Semantic Relations: Attention-Based Transformers vs. Recurrent Models
Autor: | Dmitri Roussinov, Nadezhda Puchnina, Serge Sharoff |
---|---|
Rok vydání: | 2020 |
Předmět: |
Training set
Source code Computer science business.industry media_common.quotation_subject 02 engineering and technology Benchmarking Machine learning computer.software_genre Query expansion 020204 information systems 0202 electrical engineering electronic engineering information engineering Question answering 020201 artificial intelligence & image processing Artificial intelligence Error reduction business Knowledge transfer computer Transformer (machine learning model) media_common |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783030454388 ECIR (1) |
DOI: | 10.1007/978-3-030-45439-5_37 |
Popis: | Automatically recognizing an existing semantic relation (such as “is a”, “part of”, “property of”, “opposite of” etc.) between two arbitrary words (phrases, concepts, etc.) is an important task affecting many information retrieval and artificial intelligence tasks including query expansion, common-sense reasoning, question answering, and database federation. Currently, two classes of approaches exist to classify a relation between words (concepts) X and Y: (1) path-based and (2) distributional. While the path-based approaches look at word-paths connecting X and Y in text, the distributional approaches look at statistical properties of X and Y separately, not necessary in the proximity of each other. Here, we suggest how both types can be improved and empirically compare them using several standard benchmarking datasets. For our distributional approach, we are suggesting using an attention-based transformer. While they are known to be capable of supporting knowledge transfer between different tasks, and recently set a number of benchmarking records in various applications, we are the first to successfully apply them to the task of recognizing semantic relations. To improve a path-based approach, we are suggesting our original neural word path model that combines useful properties of convolutional and recurrent networks, and thus addressing several shortcomings from the prior path-based models. Both our models significantly outperforms the state-of-the-art within its type accordingly. Our transformer-based approach outperforms current state-of-the-art by 1–12% points on 4 out of 6 standard benchmarking datasets. This results in 15–40% error reduction and is closing the gap between the automated and human performance by up to 50%. It also needs much less training data than prior approaches. For the ease of re-producing our results, we make our source code and trained models publicly available. |
Databáze: | OpenAIRE |
Externí odkaz: |