The Role of Glottal Source Parameters for High-Quality Transformation of Perceptual Age

Autor:	Axel Roebel, Xavier Favory, Nicolas Obin, Gilles Degottex
Přispěvatelé:	Analyse et synthèse sonores [Paris], Sciences et Technologies de la Musique et du Son (STMS), Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), Obin, Nicolas
Jazyk:	angličtina
Rok vydání:	2015
Předmět:	Computer science Speech recognition media_common.quotation_subject Speech synthesis computer.software_genre 01 natural sciences Voice analysis 030507 speech-language pathology & audiology 03 medical and health sciences [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] Perception 0103 physical sciences statistical modelling [SHS.LANGUE]Humanities and Social Sciences/Linguistics Control (linguistics) 010301 acoustics media_common [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing glottal source and vocal tract Variance (accounting) [SHS.LANGUE] Humanities and Social Sciences/Linguistics [STAT.ML] Statistics [stat]/Machine Learning [stat.ML] [INFO.INFO-SD] Computer Science [cs]/Sound [cs.SD] Noise Transformation (function) [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] voice transformation 0305 other medical science computer [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
Zdroj:	International Conference on Acoustics, Speech, and Signal Processing (ICASSP) International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2015, Brisbane, Australia ICASSP
Popis:	International audience; The intuitive control of voice transformation (e.g., age/sex, emotions) is useful to extend the expressive repertoire of a voice. This paper explores the role of glottal source parameters for the control of voice transformation. First, the SVLN speech synthesizer (Separation of the Vocal-tract with the Liljencrants-fant model plus Noise) is used to represent the glottal source parameters (and thus, voice quality) during speech analysis and synthesis. Then, a simple statistical method is presented to control speech parameters during voice transformation : a GMM is used to model the speech parameters of a voice, and regressions are then used to adapt the GMMs statistics (mean and variance) to a control parameter (e.g., age/sex, emotions). A subjective experiment conducted on the control of perceptual age proves the importance of the glottal source parameters for the control of voice transformation, and shows the efficiency of the statistical model to control voice parameters while preserving a high-quality of the voice transformation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::555ff1470d1e96358d09cd5d35ea595f https://hal.archives-ouvertes.fr/hal-01164562/file/index.pdf Zobrazit plný text záznamu