Zobrazeno 1 - 10
of 1 045
pro vyhledávání: '"Gandhe A"'
Autor:
Lin, Guan-Ting, Shivakumar, Prashanth Gurunath, Gourav, Aditya, Gu, Yile, Gandhe, Ankur, Lee, Hung-yi, Bulyko, Ivan
While textless Spoken Language Models (SLMs) have shown potential in end-to-end speech-to-speech modeling, they still lag behind text-based Large Language Models (LLMs) in terms of semantic coherence and relevance. This work introduces the Align-SLM
Externí odkaz:
http://arxiv.org/abs/2411.01834
Autor:
Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gourav, Aditya, Gu, Yi, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan
Large language models (LLM) have demonstrated the ability to understand human language by leveraging large amount of text data. Automatic speech recognition (ASR) systems are often limited by available transcribed speech data and benefit from a secon
Externí odkaz:
http://arxiv.org/abs/2409.16654
Autor:
Kolehmainen, Jari, Gourav, Aditya, Shivakumar, Prashanth Gurunath, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Strimel, Grant, Bulyko, Ivan
Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multi-modal large language models, it is important to extend the pure text based methods to incorporate other modalities
Externí odkaz:
http://arxiv.org/abs/2406.09618
Autor:
Yu, Yu, Yang, Chao-Han Huck, Dinh, Tuan, Ryu, Sungho, Kolehmainen, Jari, Ren, Roger, Filimonov, Denis, Shivakumar, Prashanth G., Gandhe, Ankur, Rastow, Ariya, Xu, Jia, Bulyko, Ivan, Stolcke, Andreas
The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance mo
Externí odkaz:
http://arxiv.org/abs/2401.10447
Autor:
Everson, Kevin, Gu, Yile, Yang, Huck, Shivakumar, Prashanth Gurunath, Lin, Guan-Ting, Kolehmainen, Jari, Bulyko, Ivan, Gandhe, Ankur, Ghosh, Shalini, Hamza, Wael, Lee, Hung-yi, Rastrow, Ariya, Stolcke, Andreas
In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In real-world s
Externí odkaz:
http://arxiv.org/abs/2401.02921
Autor:
Lin, Guan-Ting, Shivakumar, Prashanth Gurunath, Gandhe, Ankur, Yang, Chao-Han Huck, Gu, Yile, Ghosh, Shalini, Stolcke, Andreas, Lee, Hung-yi, Bulyko, Ivan
Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, whic
Externí odkaz:
http://arxiv.org/abs/2312.15316
Autor:
Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan
Second pass rescoring is a critical component of competitive automatic speech recognition (ASR) systems. Large language models have demonstrated their ability in using pre-trained information for better rescoring of ASR hypothesis. Discriminative tra
Externí odkaz:
http://arxiv.org/abs/2310.06248
Autor:
Yu, Yu, Yang, Chao-Han Huck, Kolehmainen, Jari, Shivakumar, Prashanth G., Gu, Yile, Ryu, Sungho, Ren, Roger, Luo, Qi, Gourav, Aditya, Chen, I-Fan, Liu, Yi-Chieh, Dinh, Tuan, Gandhe, Ankur, Filimonov, Denis, Ghosh, Shalini, Stolcke, Andreas, Rastow, Ariya, Bulyko, Ivan
Publikováno v:
Proc. IEEE ASRU Workshop, Dec. 2023
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computati
Externí odkaz:
http://arxiv.org/abs/2309.15223
Autor:
Kolehmainen, Jari, Gu, Yile, Gourav, Aditya, Shivakumar, Prashanth Gurunath, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan
Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention
Externí odkaz:
http://arxiv.org/abs/2307.06832
Autor:
Gu, Yile, Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan
Recent studies have found that model performance has a smooth power-law relationship, or scaling laws, with training data and model size, for a wide range of problems. These scaling laws allow one to choose nearly optimal data and model sizes. We stu
Externí odkaz:
http://arxiv.org/abs/2306.15815