The Role of Glottal Source Parameters for High-Quality Transformation of Perceptual Age

Autor: Axel Roebel, Xavier Favory, Nicolas Obin, Gilles Degottex
Přispěvatelé: Analyse et synthèse sonores [Paris], Sciences et Technologies de la Musique et du Son (STMS), Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), Obin, Nicolas
Jazyk: angličtina
Rok vydání: 2015
Předmět:
Computer science
Speech recognition
media_common.quotation_subject
Speech synthesis
computer.software_genre
01 natural sciences
Voice analysis
030507 speech-language pathology & audiology
03 medical and health sciences
[STAT.ML]Statistics [stat]/Machine Learning [stat.ML]
Perception
0103 physical sciences
statistical modelling
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
Control (linguistics)
010301 acoustics
media_common
[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing
glottal source and vocal tract
Variance (accounting)
[SHS.LANGUE] Humanities and Social Sciences/Linguistics
[STAT.ML] Statistics [stat]/Machine Learning [stat.ML]
[INFO.INFO-SD] Computer Science [cs]/Sound [cs.SD]
Noise
Transformation (function)
[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]
voice transformation
0305 other medical science
computer
[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
Zdroj: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2015, Brisbane, Australia
ICASSP
Popis: International audience; The intuitive control of voice transformation (e.g., age/sex, emotions) is useful to extend the expressive repertoire of a voice. This paper explores the role of glottal source parameters for the control of voice transformation. First, the SVLN speech synthesizer (Separation of the Vocal-tract with the Liljencrants-fant model plus Noise) is used to represent the glottal source parameters (and thus, voice quality) during speech analysis and synthesis. Then, a simple statistical method is presented to control speech parameters during voice transformation : a GMM is used to model the speech parameters of a voice, and regressions are then used to adapt the GMMs statistics (mean and variance) to a control parameter (e.g., age/sex, emotions). A subjective experiment conducted on the control of perceptual age proves the importance of the glottal source parameters for the control of voice transformation, and shows the efficiency of the statistical model to control voice parameters while preserving a high-quality of the voice transformation.
Databáze: OpenAIRE