DDSP-BASED SINGING VOCODERS: A NEW SUBTRACTIVE-BASED SYNTHESIZER AND A COMPREHENSIVE EVALUATION.

Autor: Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Friedman, Oscar, Jackson, Warren, Bruzenak, Scott, Yi-Wen Liu, Yi-Hsuan Yang
Předmět:
Zdroj: International Society for Music Information Retrieval Conference Proceedings; 2022, p76-83, 8p
Abstrakt: A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal Processing (DDSP), we propose a new vocoder named SawSing for singing voices. SawSing synthesizes the harmonic part of singing voices by filtering a sawtooth source signal with a linear time-variant finite impulse response filter whose coefficients are estimated from the input mel-spectrogram by a neural network. As this approach enforces phase continuity, SawSing can generate singing voices without the phase-discontinuity glitch of many existing vocoders. Moreover, the source-filter assumption provides an inductive bias that allows SawSing to be trained on a small amount of data. Our evaluation shows that SawSing converges much faster and outperforms stateof-the-art generative adversarial network- and diffusionbased vocoders in a resource-limited scenario with only 3 training recordings and a 3-hour training time. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index