Zobrazeno 1 - 10
of 26
pro vyhledávání: '"Kwangyoun Kim"'
Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow. Model compression techniques such as pruning aim to reduce the model size and computation without d
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2942c69905b1ce23a85f82e76bfd85eb
http://arxiv.org/abs/2302.14132
http://arxiv.org/abs/2302.14132
Self-supervised pre-trained transformers have improved the state of the art on a variety of speech tasks. Due to the quadratic time and space complexity of self-attention, they usually operate at the level of relatively short (e.g., utterance) segmen
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0b489bce9cd24e2e3fafe6de34051604
http://arxiv.org/abs/2212.08542
http://arxiv.org/abs/2212.08542
Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other st
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6c6b03c8a7db85167b737b39ed456fa3
http://arxiv.org/abs/2210.00077
http://arxiv.org/abs/2210.00077
Autor:
Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu J. Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition ta
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::26b5e5a6b483bb52f21e1898429722c2
Autor:
Kwangyoun Kim, Shatrughan Singh, Dhananjaya Gowda, Ankur Kumar, Sachin K. Singh, Chanwoo Kim, Ashutosh Gupta
Publikováno v:
ICASSP
In this paper, we propose methods to compute confidence score on the predictions made by an end-to-end speech recognition model in a 2-pass framework. We use RNN-Transducer for a streaming model, and an attention-based decoder for the second pass mod
Automatic speech recognition (ASR) models make fewer errors when more surrounding speech information is presented as context. Unfortunately, acquiring a larger future context leads to higher latency. There exists an inevitable trade-off between speed
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::652ae23db727bcb10e10bade7aa5da4c
Autor:
Hejung Yang, Dhananjaya Gowda, Kwangyoun Kim, Jiyeon Kim, Mehul Kumar, Sichen Jin, Chanwoo Kim, Ankur Kumar, Sachin K. Singh, Abhinav Garg, Shatrughan Singh
Publikováno v:
INTERSPEECH
Autor:
Kyungbo Min, Junmo Park, Aditya Jayasimha, Sichen Jin, Kwangyoun Kim, Jiyeon Kim, Young-Yoon Lee, Chanwoo Kim, Youngho Han, Kim Sooyeon, Abhinav Garg, Gowtham P. Vadisetti, Dhananjaya Gowda
Publikováno v:
INTERSPEECH
Publikováno v:
ICASSP
In this paper, we present a Small Energy Masking (SEM) algorithm, which masks inputs having values below a certain threshold. More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy thresho
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::dbf3fd2352718c3bfb6698265d1727f4
http://arxiv.org/abs/2002.06312
http://arxiv.org/abs/2002.06312
Autor:
Sung-Soo Kim, Minkyoo Shin, Dhananjaya Gowda, Jiyeon Kim, Abhinav Garg, Shatrughan Singh, Eunhyang Kim, Mehul Kumar, Kwangyoun Kim, Chanwoo Kim, Changwoo Han, Larry Heck, Kyungmin Lee
Publikováno v:
ASRU
In this paper, we present an end-to-end training framework for building state-of-the-art end-to-end speech recognition systems. Our training system utilizes a cluster of Central Processing Units(CPUs) and Graphics Processing Units (GPUs). The entire
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c7820ff2e2695f699fd04128413983fd
http://arxiv.org/abs/1912.11040
http://arxiv.org/abs/1912.11040