Vocal Tract Contour Tracking in rtMRI Using Deep Temporal Regression Network

Autor: Engin Erzin, Sasan Asadiabadi
Přispěvatelé: Asadiabadi, Sasan, Erzin, Engin (ORCID 0000-0002-2715-2368 & YÖK ID 34503), Graduate School of Sciences and Engineering, College of Engineering, Department of Electrical and Electronics Engineering, Department of Computer Engineering
Rok vydání: 2020
Předmět:
Zdroj: IEEE/ACM Transactions on Audio, Speech, and Language Processing
ISSN: 2329-9304
2329-9290
DOI: 10.1109/taslp.2020.3036182
Popis: Recent advances in real-time Magnetic Resonance Imaging (rtMRI) provide an invaluable tool to study speech articulation. In this paper, we present an effective deep learning approach for supervised detection and tracking of vocal tract contours in a sequence of rtMRI frames. We train a single input multiple output deep temporal regression network (DTRN) to detect the vocal tract (VT) contour and the separation boundary between different articulators. The DTRN learns the non-linear mapping from an overlapping fixed-length sequence of rtMRI frames to the corresponding articulatory movements, where a blend of the overlapping contour estimates defines the detected VT contour. The detected contour is refined at a post-processing stage using an appearance model to further improve the accuracy of VT contour detection. The proposed VT contour tracking model is trained and evaluated over the USC-TIMIT dataset. Performance evaluation is carried out using three objective assessment metrics for the separating landmark detection, contour tracking and temporal stability of the contour landmarks in comparison with three baseline approaches from the recent literature. Results indicate significant improvements with the proposed method over the state-of-the-art baselines.
NA
Databáze: OpenAIRE