How Far Can We Go with Data Selection? A Case Study on Semantic Sequence Tagging Tasks

Autor:	Bernardo Magnini, Samuel Louvan
Rok vydání:	2020
Předmět:	Sequence Computer science business.industry Artificial intelligence computer.software_genre Transfer of learning business computer Natural language processing Data selection
Zdroj:	Insights
DOI:	10.18653/v1/2020.insights-1.3
Popis:	Although several works have addressed the role of data selection to improve transfer learning for various NLP tasks, there is no consensus about its real benefits and, more generally, there is a lack of shared practices on how it can be best applied. We propose a systematic approach aimed at evaluating data selection in scenarios of increasing complexity. Specifically, we compare the case in which source and target tasks are the same while source and target domains are different, against the more challenging scenario where both tasks and domains are different. We run a number of experiments on semantic sequence tagging tasks, which are relatively less investigated in data selection, and conclude that data selection has more benefit on the scenario when the tasks are the same, while in case of different (although related) tasks from distant domains, a combination of data selection and multi-task learning is ineffective for most cases.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::3b4bbd12753e77b0dfbaa9247ad4c276 https://doi.org/10.18653/v1/2020.insights-1.3 Zobrazit plný text záznamu