Zobrazeno 1 - 10
of 87
pro vyhledávání: '"Mei-Yuh Hwang"'
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 27:1826-1838
End-to-end speech recognition, such as attention based approaches, is an emerging and attractive topic in recent years. It has achieved comparable performance with the traditional speech recognition framework. Because end-to-end approaches integrate
Publikováno v:
IEEE Signal Processing Letters. 26:1471-1475
We apply an anchor-based region proposal network (RPN) for end-to-end keyword spotting (KWS). RPNs have been widely used for object detection in image and video processing; here, it is used to jointly model keyword classification and localization. Th
Publikováno v:
ICASSP
Max-pooling neural network architectures have been proven to be useful for keyword spotting (KWS), but standard training methods suffer from a class-imbalance problem when using all frames from negative utterances. To address the problem, we propose
Publikováno v:
APSIPA
Far-field speech recognition is becoming a hot topic in research and industrial applications. In this paper, in order to improve far-field speech recognition performance, we propose to use multiple fixed beamformers with a spacial Wiener-form postfil
Publikováno v:
INTERSPEECH
Publikováno v:
ACL (1)
Clarifying user needs is essential for existing task-oriented dialogue systems. However, in real-world applications, developers can never guarantee that all possible user demands are taken into account in the design phase. Consequently, existing syst
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f20a8eabd02f4493ae5be5288238e229
http://arxiv.org/abs/1906.04991
http://arxiv.org/abs/1906.04991
Publikováno v:
ICASSP
Recurrent Neural Networks (RNNs) have dominated language modeling because of their superior performance over traditional N-gram based models. In many applications, a large Recurrent Neural Network language model (RNNLM) or an ensemble of several RNNL
Publikováno v:
ICASSP
Long Short Term Memory Connectionist Temporal Classification (LSTM-CTC) based end-to-end models are widely used in speech recognition due to its simplicity in training and efficiency in decoding. In conventional LSTM-CTC based models, a bottleneck pr
Publikováno v:
5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018).
Publikováno v:
INTERSPEECH
This paper explores the use of adversarial examples in training speech recognition systems to increase robustness of deep neural network acoustic models. During training, the fast gradient sign method is used to generate adversarial examples augmenti