Zobrazeno 1 - 10
of 15
pro vyhledávání: '"Shaojin Ding"'
Autor:
Tom O’Malley, Shaojin Ding, Arun Narayanan, Quan Wang, Rajeev Rikhye, Qiao Liang, Yanzhang He, Ian McGraw
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Autor:
Weiran Wang, Ding Zhao, Shaojin Ding, Hao Zhang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Yanzhang He, Ian McGraw, Shankar Kumar
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Autor:
Shaojin Ding, Wang Weiran, Ding Zhao, Tara Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of qual
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d8444eebad257eba5a9ce7891486e95e
http://arxiv.org/abs/2204.06164
http://arxiv.org/abs/2204.06164
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 29:2367-2381
Foreign accent conversion (FAC) is the problem of generating a synthetic voice that has the voice identity of a second-language (L2) learner and the pronunciation patterns of a native (L1) speaker. This synthetic voice has been referred to as a “go
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 28:343-354
Sparse-coding techniques for voice conversion assume that an utterance can be decomposed into a sparse code that only carries linguistic contents, and a dictionary of atoms that captures the speakers’ characteristics. However, conventional dictiona
Autor:
Ricardo Gutierrez-Osuna, Guanlong Zhao, Christopher Liberatore, Shaojin Ding, John M. Levis, Evgeny Chukharev-Hudilainen, Alif Silpachai, Ivana Lucic, Sinem Sonsaat
Publikováno v:
Speech Communication. 115:51-66
The type of voice model used in Computer Assisted Pronunciation Instruction is a crucial factor in the quality of practice and the amount of uptake by language learners. As an example, prior research indicates that second-language learners are more l
Publikováno v:
INTERSPEECH
Publikováno v:
Computer Speech & Language. 72:101302
Foreign accent conversion (FAC) aims to create a new voice that has the voice identity of a given second-language (L2) speaker but with a native (L1) accent. Previous FAC approaches usually require training a separate model for each L2 speaker and, m
In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo from overlapping speech recordings. Such a system can largely improve speech recognition performance and user experience for
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3ca151b58841b19551a1ce8dfb063cbc
http://arxiv.org/abs/2008.06006
http://arxiv.org/abs/2008.06006
Publikováno v:
INTERSPEECH
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet. However, these backbones were originally proposed for image classification, and therefore may not be nat
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::81d39edf4f68801e93a951b73f520907