Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Shivakumar, Prashanth Gurunath"'
Autor:
Kolehmainen, Jari, Gourav, Aditya, Shivakumar, Prashanth Gurunath, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Strimel, Grant, Bulyko, Ivan
Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multi-modal large language models, it is important to extend the pure text based methods to incorporate other modalities
Externí odkaz:
http://arxiv.org/abs/2406.09618
Autor:
Everson, Kevin, Gu, Yile, Yang, Huck, Shivakumar, Prashanth Gurunath, Lin, Guan-Ting, Kolehmainen, Jari, Bulyko, Ivan, Gandhe, Ankur, Ghosh, Shalini, Hamza, Wael, Lee, Hung-yi, Rastrow, Ariya, Stolcke, Andreas
In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In real-world s
Externí odkaz:
http://arxiv.org/abs/2401.02921
Autor:
Lin, Guan-Ting, Shivakumar, Prashanth Gurunath, Gandhe, Ankur, Yang, Chao-Han Huck, Gu, Yile, Ghosh, Shalini, Stolcke, Andreas, Lee, Hung-yi, Bulyko, Ivan
Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, whic
Externí odkaz:
http://arxiv.org/abs/2312.15316
Autor:
Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan
Second pass rescoring is a critical component of competitive automatic speech recognition (ASR) systems. Large language models have demonstrated their ability in using pre-trained information for better rescoring of ASR hypothesis. Discriminative tra
Externí odkaz:
http://arxiv.org/abs/2310.06248
Autor:
Kolehmainen, Jari, Gu, Yile, Gourav, Aditya, Shivakumar, Prashanth Gurunath, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan
Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention
Externí odkaz:
http://arxiv.org/abs/2307.06832
Autor:
Gu, Yile, Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan
Recent studies have found that model performance has a smooth power-law relationship, or scaling laws, with training data and model size, for a wide range of problems. These scaling laws allow one to choose nearly optimal data and model sizes. We stu
Externí odkaz:
http://arxiv.org/abs/2306.15815
Autor:
Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan
Second-pass rescoring is employed in most state-of-the-art speech recognition systems. Recently, BERT based models have gained popularity for re-ranking the n-best hypothesis by exploiting the knowledge from masked language model pre-training. Furthe
Externí odkaz:
http://arxiv.org/abs/2306.09452
Automatic inference of important paralinguistic information such as age from speech is an important area of research with numerous spoken language technology based applications. Speaker age estimation has applications in enabling personalization and
Externí odkaz:
http://arxiv.org/abs/2109.01568
A key desiderata for inclusive and accessible speech recognition technology is ensuring its robust performance to children's speech. Notably, this includes the rapidly advancing neural network based end-to-end speech recognition systems. Children spe
Externí odkaz:
http://arxiv.org/abs/2102.09918
Word vector representations enable machines to encode human language for spoken language understanding and processing. Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities pre
Externí odkaz:
http://arxiv.org/abs/2102.02270