Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Luo, Mingshuang"'
Autor:
Luo, Mingshuang, Hou, Ruibing, Li, Zhuo, Chang, Hong, Liu, Zimo, Wang, Yaowei, Shan, Shiguang
This paper presents M$^3$GPT, an advanced $\textbf{M}$ultimodal, $\textbf{M}$ultitask framework for $\textbf{M}$otion comprehension and generation. M$^3$GPT operates on three fundamental principles. The first focuses on creating a unified representat
Externí odkaz:
http://arxiv.org/abs/2405.16273
Autor:
Guo, Liyong, Yang, Xiaoyu, Wang, Quandong, Kong, Yuxiang, Yao, Zengwei, Cui, Fan, Kuang, Fangjun, Kang, Wei, Lin, Long, Luo, Mingshuang, Zelasko, Piotr, Povey, Daniel
Knowledge distillation(KD) is a common approach to improve model performance in automatic speech recognition (ASR), where a student model is trained to imitate the output behaviour of a teacher model. However, traditional KD methods suffer from teach
Externí odkaz:
http://arxiv.org/abs/2211.00508
Autor:
Kang, Wei, Guo, Liyong, Kuang, Fangjun, Lin, Long, Luo, Mingshuang, Yao, Zengwei, Yang, Xiaoyu, Żelasko, Piotr, Povey, Daniel
The transducer architecture is becoming increasingly popular in the field of speech recognition, because it is naturally streaming as well as high in accuracy. One of the drawbacks of transducer is that it is difficult to decode in a fast and paralle
Externí odkaz:
http://arxiv.org/abs/2211.00484
Autor:
Kuang, Fangjun, Guo, Liyong, Kang, Wei, Lin, Long, Luo, Mingshuang, Yao, Zengwei, Povey, Daniel
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that
Externí odkaz:
http://arxiv.org/abs/2206.13236
Autor:
Fang, Yitian1,2 (AUTHOR), Luo, Mingshuang2 (AUTHOR) dqwei@sjtu.edu.cn, Ren, Zhixiang2 (AUTHOR), Wei, Leyi3,4 (AUTHOR) dqwei@sjtu.edu.cn, Wei, Dong-Qing1,2 (AUTHOR) dqwei@sjtu.edu.cn
Publikováno v:
Briefings in Bioinformatics. Jul2024, Vol. 25 Issue 4, p1-12. 12p.
Lip reading has received increasing attention in recent years. This paper focuses on the synergy of multilingual lip reading. There are about as many as 7000 languages in the world, which implies that it is impractical to train separate lip reading m
Externí odkaz:
http://arxiv.org/abs/2005.03846
Lip-reading aims to infer the speech content from the lip movement sequence and can be seen as a typical sequence-to-sequence (seq2seq) problem which translates the input image sequence of lip movements to the text sequence of the speech content. How
Externí odkaz:
http://arxiv.org/abs/2003.03983
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Guo, Liyong, Yang, Xiaoyu, Wang, Quandong, Kong, Yuxiang, Yao, Zengwei, Cui, Fan, Kuang, Fangjun, Kang, Wei, Lin, Long, Luo, Mingshuang, Zelasko, Piotr, Povey, Daniel
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Knowledge distillation(KD) is a common approach to improve model performance in automatic speech recognition (ASR), where a student model is trained to imitate the output behaviour of a teacher model. However, traditional KD methods suffer from teach