Accelerating the Centerline Processing of Vocal Tract Shapes for Articulatory Synthesis
Autor: | Karpinski, Romain, Ribeiro, Vinicius, Laprie, Yves |
---|---|
Přispěvatelé: | Service Informatique de Soutien à la Recherche (SISR), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laprie, Yves |
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | ICA 2022-24th International Congress on Acoustics ICA 2022-24th International Congress on Acoustics, Oct 2022, Gyeongyu, South Korea |
Popis: | International audience; Acoustic simulations used in the articulatory synthesis of speech take a series of vocal tract shapes as an input. Acoustic simulations assume a plane wave propagation, simplifying and limiting the calculation time. It is, therefore, necessary to split 2D vocal tract shapes into small tubes perpendicular to the centerline simulating the plane wave propagation. The algorithm developed previously used a time-consuming regularization step whose computation time was close to that of acoustic simulations. Therefore, we explored the possibility of using deep learning to perform this step and accelerate the whole synthesis process. We used a database with a large number of rt-MRI images (150 000) and our regularizing algorithm for training. Two architectures were tested, one using a regression strategy applied to the two curves defining the vocal tract and one exploiting the classification of pixels in 2D images of the vocal tract. The first turned out to be much faster, even if it requires checking that the center line is correct and, in some sporadic cases using the initial algorithm as a fallback solution. |
Databáze: | OpenAIRE |
Externí odkaz: |