Audio Embeddings Help to Learn Better Dialogue Policies

Autor: López Zorrilla, Asier, Torres Barañano, María Inés, Cuayáhuitl, Heriberto
Rok vydání: 2021
Předmět:
Zdroj: Addi. Archivo Digital para la Docencia y la Investigación
instname
DOI: 10.1109/asru51503.2021.9688296
Popis: Presentado en ASRU 2021, Cartagena (Colombia), 13-17 diciembre 2021 Neural transformer architectures have gained a lot of interest for text-based dialogue management in the last few years. They have shown high learning capabilities for open domain dialogue with huge amounts of data and also for domain adaptation in task-oriented setups. But the potential benefits of exploiting the users’ audio signal have rarely been ex- plored in such frameworks. In this work, we combine text dialogue history representations generated by a GPT-2 model with audio embeddings obtained by the recently released Wav2Vec2 transformer model. We jointly fine-tune these models to learn dialogue policies via supervised learning and two policy gradient-based reinforcement learning algorithms. Our experimental results, using the DSTC2 dataset and a sim- ulated user model capable of sampling audio turns, reveal that audio embeddings lead to overall higher task success (than without using audio embeddings) with statistically significant results across evaluation metrics and training algorithms.
Databáze: OpenAIRE