A deep architecture with bilinear modeling of hidden representations: Applications to phonetic recognition.

Autor: Hutchinson, Brian, Deng, Li, Yu, Dong
Zdroj: 2012 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP); 1/ 1/2012, p4805-4808, 4p
Abstrakt: We develop and describe a novel deep architecture, the Tensor Deep Stacking Network (T-DSN), where multiple blocks are stacked one on top of another and where a bilinear mapping from hidden representations to the output in each block is used to incorporate higher-order statistics of the input features. A learning algorithm for the T-DSN is presented, in which the main parameter estimation burden is shifted to a convex sub-problem with a closed-form solution. Using an efficient and scalable parallel implementation, we train a T-DSN to discriminate standard three-state monophones in the TIMIT database. The T-DSN outperforms an alternative pretrained Deep Neural Network (DNN) architecture in frame-level classification (both state and phone) and in the cross-entropy measure. For continuous phonetic recognition, T-DSN performs equivalently to a DNN but without the need for a hard-to-scale, sequential fine-tuning step. [ABSTRACT FROM PUBLISHER]
Databáze: Complementary Index