Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation
Autor: | Thomas Drugman, Tuomo Raitio, Daniel Erro, Onur Babacan, Thierry Dutoit |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Sound (cs.SD) Computer Science - Computation and Language Computer science Stochastic modelling Speech recognition SIGNAL (programming language) 020206 networking & telecommunications Context (language use) 02 engineering and technology Speech processing Computer Science - Sound Singing voice synthesis 030507 speech-language pathology & audiology 03 medical and health sciences Noise Audio and Speech Processing (eess.AS) 0202 electrical engineering electronic engineering information engineering Harmonic FOS: Electrical engineering electronic engineering information engineering 0305 other medical science Representation (mathematics) Computation and Language (cs.CL) Parametric statistics Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | ICASSP |
DOI: | 10.48550/arxiv.2006.04142 |
Popis: | Various parametric representations have been proposed to model the speech signal. While the performance of such vocoders is well-known in the context of speech processing, their extrapolation to singing voice synthesis might not be straightforward. The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical parametric synthesis: traditional pulse vocoder, Deterministic plus Stochastic Model, Harmonic plus Noise Model and GlottHMM. The behavior of these techniques as a function of the singer type (baritone, counter-tenor and soprano) is studied. Secondly, the artifacts occurring in high-pitched voices are discussed and possible approaches to overcome them are suggested. |
Databáze: | OpenAIRE |
Externí odkaz: |