HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters

Autor:	Juvela, Lauri, Zarazaga, Pablo Pérez, Henter, Gustav Eje, Malisz, Zofia
Rok vydání:	2024
Předmět:	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	We introduce an end-to-end neural speech synthesis system that uses the source-filter model of speech production. Specifically, we apply differentiable resonant filters to a glottal waveform generated by a neural vocoder. The aim is to obtain a controllable synthesiser, similar to classic formant synthesis, but with much higher perceptual quality - filling a research gap in current neural waveform generators and responding to hitherto unmet needs in the speech sciences. Our setup generates audio from a core set of phonetically meaningful speech parameters, with the filters providing direct control over formant frequency resonances in synthesis. Direct synthesis control is a key feature for reliable stimulus creation in important speech science experiments. We show that the proposed source-filter method gives better perceptual quality than the industry standard for formant manipulation (i.e., Praat), whilst being competitive in terms of formant frequency control accuracy. Comment: Submitted to ICASSP 2025
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2409.14823 Zobrazit plný text záznamu View this record from Arxiv