New Protocols and Negative Results for Textual Entailment Data Collection
Autor: | Jennimaria Palomaki, Samuel R. Bowman, Emily Pitler, Livio Soares |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Protocol (science)
FOS: Computer and information sciences Data collection Computer Science - Computation and Language Computer science business.industry 02 engineering and technology 010501 environmental sciences computer.software_genre Crowdsourcing 01 natural sciences Annotation 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence Transfer of learning Baseline (configuration management) business Textual entailment computer Computation and Language (cs.CL) Natural language processing 0105 earth and related environmental sciences |
Zdroj: | EMNLP (1) |
Popis: | Natural language inference (NLI) data has proven useful in benchmarking and, especially, as pretraining data for tasks requiring language understanding. However, the crowdsourcing protocol that was used to collect this data has known issues and was not explicitly optimized for either of these purposes, so it is likely far from ideal. We propose four alternative protocols, each aimed at improving either the ease with which annotators can produce sound training examples or the quality and diversity of those examples. Using these alternatives and a fifth baseline protocol, we collect and compare five new 8.5k-example training sets. In evaluations focused on transfer learning applications, our results are solidly negative, with models trained on our baseline dataset yielding good transfer performance to downstream tasks, but none of our four new methods (nor the recent ANLI) showing any improvements over that baseline. In a small silver lining, we observe that all four new protocols, especially those where annotators edit pre-filled text boxes, reduce previously observed issues with annotation artifacts. To appear at EMNLP 2020 |
Databáze: | OpenAIRE |
Externí odkaz: |