Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Park, Heayoung"'
This paper presents FastFit, a novel neural vocoder architecture that replaces the U-Net encoder with multiple short-time Fourier transforms (STFTs) to achieve faster generation rates without sacrificing sample quality. We replaced each encoder block
Externí odkaz:
http://arxiv.org/abs/2305.10823
We propose Jointly trained Duration Informed Transformer (JDI-T), a feed-forward Transformer with a duration predictor jointly trained without explicit alignments in order to generate an acoustic feature sequence from an input text. In this work, ins
Externí odkaz:
http://arxiv.org/abs/2005.07799