Characterization of inter-speaker articulatory variability: A two-level multi-speaker modelling approach based on MRI data

Autor:	Pierre Badin, Antoine Serrurier, Christiane Neuschaefer-Rube, Laurent Lamalle
Přispěvatelé:	Rheinisch-Westfälische Technische Hochschule Aachen (RWTH), GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing (GIPSA-CRISSP), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), RMN biomédicale : de la cellule à l'homme (RBCH), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-CHU Grenoble-DIR CENTRALE DU SSA-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Badin, Pierre, Rheinisch-Westfälische Technische Hochschule Aachen University (RWTH)
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	Adult Male Glottis multi-speaker models Acoustics and Ultrasonics Computer science Speech recognition [SHS.INFO]Humanities and Social Sciences/Library and information sciences Intelligibility (communication) [SHS.INFO] Humanities and Social Sciences/Library and information sciences Speech Acoustics 030507 speech-language pathology & audiology 03 medical and health sciences Arts and Humanities (miscellaneous) Phonetics vocal tract Humans 030304 developmental biology 0303 health sciences articulatory modelling Speech Intelligibility Variance (accounting) Middle Aged Models Theoretical Magnetic Resonance Imaging Biological Variation Population inter-speaker variability Voice Female 0305 other medical science Vocal tract MRI
Zdroj:	Journal of the Acoustical Society of America Journal of the Acoustical Society of America, Acoustical Society of America, 2019, 145 (4), pp.2149-2170. ⟨10.1121/1.5096631⟩ Journal of the Acoustical Society of America, 2019, 145 (4), pp.2149-2170. ⟨10.1121/1.5096631⟩
ISSN:	0001-4966 1520-8524
Popis:	Speech communication relies on articulatory and acoustic codes shared between speakers and listeners despite inter-individual differences in morphology and idiosyncratic articulatory strategies. This study addresses the long-standing problem of characterizing and modelling speaker-independent articulatory strategies and inter-speaker articulatory variability. It explores a multi-speaker modelling approach based on two levels: statistically-based linear articulatory models, which capture the speaker-specific articulatory variability on the one hand, are in turn controlled by a speaker model, which captures the inter-speaker variability on the other hand. A low dimensionality speaker model is obtained by taking advantage of the inter-speaker correlations between morphology and strategy. To validate this approach, contours of the vocal tract articulators were manually segmented on midsagittal MRI data recorded from 11 French speakers uttering 62 vowels and consonants. Using these contours, multi-speaker models with 14 articulatory components and two morphology and strategy components led to overall variance explanations of 66%–69% and root-mean-square errors of 0.36–0.38 cm obtained in leave-one-out procedure over the speakers. Results suggest that inter-speaker variability is more related to the morphology than to the idiosyncratic strategies and illustrate the adaptation of the articulatory components to the morphology.Speech communication relies on articulatory and acoustic codes shared between speakers and listeners despite inter-individual differences in morphology and idiosyncratic articulatory strategies. This study addresses the long-standing problem of characterizing and modelling speaker-independent articulatory strategies and inter-speaker articulatory variability. It explores a multi-speaker modelling approach based on two levels: statistically-based linear articulatory models, which capture the speaker-specific articulatory variability on the one hand, are in turn controlled by a speaker model, which captures the inter-speaker variability on the other hand. A low dimensionality speaker model is obtained by taking advantage of the inter-speaker correlations between morphology and strategy. To validate this approach, contours of the vocal tract articulators were manually segmented on midsagittal MRI data recorded from 11 French speakers uttering 62 vowels and consonants. Using these contours, multi-speaker mode...
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d0903d6597bf8536a7880a48055e136d https://hal.archives-ouvertes.fr/hal-02106595 Zobrazit plný text záznamu