Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation

Autor: Thomas Drugman, Tuomo Raitio, Daniel Erro, Onur Babacan, Thierry Dutoit
Rok vydání: 2020
Předmět:
FOS: Computer and information sciences
Sound (cs.SD)
Computer Science - Computation and Language
Computer science
Stochastic modelling
Speech recognition
SIGNAL (programming language)
020206 networking & telecommunications
Context (language use)
02 engineering and technology
Speech processing
Computer Science - Sound
Singing voice synthesis
030507 speech-language pathology & audiology
03 medical and health sciences
Noise
Audio and Speech Processing (eess.AS)
0202 electrical engineering
electronic engineering
information engineering

Harmonic
FOS: Electrical engineering
electronic engineering
information engineering

0305 other medical science
Representation (mathematics)
Computation and Language (cs.CL)
Parametric statistics
Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj: ICASSP
DOI: 10.48550/arxiv.2006.04142
Popis: Various parametric representations have been proposed to model the speech signal. While the performance of such vocoders is well-known in the context of speech processing, their extrapolation to singing voice synthesis might not be straightforward. The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical parametric synthesis: traditional pulse vocoder, Deterministic plus Stochastic Model, Harmonic plus Noise Model and GlottHMM. The behavior of these techniques as a function of the singer type (baritone, counter-tenor and soprano) is studied. Secondly, the artifacts occurring in high-pitched voices are discussed and possible approaches to overcome them are suggested.
Databáze: OpenAIRE