Controlling High-Dimensional Data With Sparse Input
Autor: | Iliescu, Dan Andrei, Mohan, Devang Savita Ram, Teh, Tian Huey, Hodari, Zack |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2023 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Computation and Language Audio and Speech Processing (eess.AS) Computer Science - Artificial Intelligence FOS: Electrical engineering electronic engineering information engineering Computation and Language (cs.CL) Machine Learning (cs.LG) Electrical Engineering and Systems Science - Audio and Speech Processing |
Popis: | We address the problem of human-in-the-loop control for generating highly-structured data. This task is challenging because existing generative models lack an efficient interface through which users can modify the output. Users have the option to either manually explore a non-interpretable latent space, or to laboriously annotate the data with conditioning labels. To solve this, we introduce a novel framework whereby an encoder maps a sparse, human interpretable control space onto the latent space of a generative model. We apply this framework to the task of controlling prosody in text-to-speech synthesis. We propose a model, called Multiple-Instance CVAE (MICVAE), that is specifically designed to encode sparse prosodic features and output complete waveforms. We show empirically that MICVAE displays desirable qualities of a sparse human-in-the-loop control mechanism: efficiency, robustness, and faithfulness. With even a very small number of input values (~4), MICVAE enables users to improve the quality of the output significantly, in terms of listener preference (4:1). 11 pages |
Databáze: | OpenAIRE |
Externí odkaz: |