Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Han, Kyu J"'
Autor:
Peri, Raghuveer, Jayanthi, Sai Muralidhar, Ronanki, Srikanth, Bhatia, Anshu, Mundnich, Karel, Dingliwal, Saket, Das, Nilaksh, Hou, Zejiang, Huybrechts, Goeric, Vishnubhotla, Srikanth, Garcia-Romero, Daniel, Srinivasan, Sundararajan, Han, Kyu J, Kirchhoff, Katrin
Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we in
Externí odkaz:
http://arxiv.org/abs/2405.08317
Autor:
Das, Nilaksh, Dingliwal, Saket, Ronanki, Srikanth, Paturi, Rohit, Huang, Zhaocheng, Mathur, Prashant, Yuan, Jie, Bekal, Dhanush, Niu, Xing, Jayanthi, Sai Muralidhar, Li, Xilai, Mundnich, Karel, Sunkara, Monica, Srinivasan, Sundararajan, Han, Kyu J, Kirchhoff, Katrin
Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text
Externí odkaz:
http://arxiv.org/abs/2405.08295
Autor:
Goncalves, Lucas, Mathur, Prashant, Lavania, Chandrashekhar, Cekic, Metehan, Federico, Marcello, Han, Kyu J.
Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks. However, the growth is not attributed solely to models and benchmarks. Universally accepted evaluat
Externí odkaz:
http://arxiv.org/abs/2404.07336
Autor:
Kim, Kwangyoun, Wu, Felix, Peng, Yifan, Pan, Jing, Sridhar, Prashant, Han, Kyu J., Watanabe, Shinji
Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other st
Externí odkaz:
http://arxiv.org/abs/2210.00077
Spoken language understanding (SLU) tasks involve mapping from speech audio signals to semantic labels. Given the complexity of such tasks, good performance might be expected to require large labeled datasets, which are difficult to collect for each
Externí odkaz:
http://arxiv.org/abs/2112.07648
Autor:
Shon, Suwon, Pasad, Ankita, Wu, Felix, Brusco, Pablo, Artzi, Yoav, Livescu, Karen, Han, Kyu J.
Progress in speech processing has been facilitated by shared datasets and benchmarks. Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks. Interest has been growing in higher-level
Externí odkaz:
http://arxiv.org/abs/2111.10367
Automatic speech recognition (ASR) models make fewer errors when more surrounding speech information is presented as context. Unfortunately, acquiring a larger future context leads to higher latency. There exists an inevitable trade-off between speed
Externí odkaz:
http://arxiv.org/abs/2106.09760
In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis. First, we investigate how useful a pre-trained language model would be in a 2-step pipeline approach emplo
Externí odkaz:
http://arxiv.org/abs/2106.06598
Autor:
Park, Tae Jin, Kanda, Naoyuki, Dimitriadis, Dimitrios, Han, Kyu J., Watanabe, Shinji, Narayanan, Shrikanth
Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for speech recognit
Externí odkaz:
http://arxiv.org/abs/2101.09624
In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling. In the
Externí odkaz:
http://arxiv.org/abs/2005.10469