Simple Data Augmentation with the Mask Token Improves Domain Adaptation for Dialog Act Tagging
Autor: | Nitish Shirish Keskar, Caiming Xiong, Wenhao Liu, Kazuma Hashimoto, Richard Socher, Semih Yavuz |
---|---|
Rok vydání: | 2020 |
Předmět: |
Scheme (programming language)
Computer science business.industry Generalization 05 social sciences 010501 environmental sciences computer.software_genre Security token 01 natural sciences Domain (software engineering) Dialog act Consistency (database systems) 0502 economics and business Artificial intelligence 050207 economics business computer Regularization (linguistics) Natural language processing 0105 earth and related environmental sciences computer.programming_language |
Zdroj: | EMNLP (1) |
Popis: | The concept of Dialogue Act (DA) is universal across different task-oriented dialogue domains - the act of ``request" carries the same speaker intention whether it is for restaurant reservation or flight booking. However, DA taggers trained on one domain do not generalize well to other domains, which leaves us with the expensive need for a large amount of annotated data in the target domain. In this work, we investigate how to better adapt DA taggers to desired target domains with only unlabeled data. We propose MaskAugment, a controllable mechanism that augments text input by leveraging the pre-trained Mask token from BERT model. Inspired by consistency regularization, we use MaskAugment to introduce an unsupervised teacher-student learning scheme to examine the domain adaptation of DA taggers. Our extensive experiments on the Simulated Dialogue (GSim) and Schema-Guided Dialogue (SGD) datasets show that MaskAugment is useful in improving the cross-domain generalization for DA tagging. |
Databáze: | OpenAIRE |
Externí odkaz: |