Výsledky vyhledávání - "Shivakumar, Prashanth Gurunath"

Report

Multi-Modal Retrieval For Large Language Model Based Speech Recognition

Autor: Kolehmainen, Jari, Gourav, Aditya, Shivakumar, Prashanth Gurunath, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Strimel, Grant, Bulyko, Ivan

Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multi-modal large language models, it is important to extend the pure text based methods to incorporate other modalities

Externí odkaz: http://arxiv.org/abs/2406.09618

Zobrazit plný text záznamu

Report

Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

Autor: Everson, Kevin, Gu, Yile, Yang, Huck, Shivakumar, Prashanth Gurunath, Lin, Guan-Ting, Kolehmainen, Jari, Bulyko, Ivan, Gandhe, Ankur, Ghosh, Shalini, Hamza, Wael, Lee, Hung-yi, Rastrow, Ariya, Stolcke, Andreas

In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In real-world s

Externí odkaz: http://arxiv.org/abs/2401.02921

Zobrazit plný text záznamu

Report

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

Autor: Lin, Guan-Ting, Shivakumar, Prashanth Gurunath, Gandhe, Ankur, Yang, Chao-Han Huck, Gu, Yile, Ghosh, Shalini, Stolcke, Andreas, Lee, Hung-yi, Bulyko, Ivan

Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, whic

Externí odkaz: http://arxiv.org/abs/2312.15316

Zobrazit plný text záznamu

Report

Discriminative Speech Recognition Rescoring with Pre-trained Language Models

Autor: Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan

Second pass rescoring is a critical component of competitive automatic speech recognition (ASR) systems. Large language models have demonstrated their ability in using pre-trained information for better rescoring of ASR hypothesis. Discriminative tra

Externí odkaz: http://arxiv.org/abs/2310.06248

Zobrazit plný text záznamu

Report

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Autor: Kolehmainen, Jari, Gu, Yile, Gourav, Aditya, Shivakumar, Prashanth Gurunath, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan

Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention

Externí odkaz: http://arxiv.org/abs/2307.06832

Zobrazit plný text záznamu

Report

Scaling Laws for Discriminative Speech Recognition Rescoring Models

Autor: Gu, Yile, Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan

Recent studies have found that model performance has a smooth power-law relationship, or scaling laws, with training data and model size, for a wide range of problems. These scaling laws allow one to choose nearly optimal data and model sizes. We stu

Externí odkaz: http://arxiv.org/abs/2306.15815

Zobrazit plný text záznamu

Report

Distillation Strategies for Discriminative Speech Recognition Rescoring

Autor: Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Bulyko, Ivan

Second-pass rescoring is employed in most state-of-the-art speech recognition systems. Recently, BERT based models have gained popularity for re-ranking the n-best hypothesis by exploiting the knowledge from masked language model pre-training. Furthe

Externí odkaz: http://arxiv.org/abs/2306.09452

Zobrazit plný text záznamu

Report

Phone Duration Modeling for Speaker Age Estimation in Children

Autor: Shivakumar, Prashanth Gurunath, Bishop, Somer, Lord, Catherine, Narayanan, Shrikanth

Automatic inference of important paralinguistic information such as age from speech is an important area of research with numerous spoken language technology based applications. Speaker age estimation has applications in enabling personalization and

Externí odkaz: http://arxiv.org/abs/2109.01568

Zobrazit plný text záznamu

Report

End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study

Autor: Shivakumar, Prashanth Gurunath, Narayanan, Shrikanth

A key desiderata for inclusive and accessible speech recognition technology is ensuring its robust performance to children's speech. Notably, this includes the rapidly advancing neural network based end-to-end speech recognition systems. Children spe

Externí odkaz: http://arxiv.org/abs/2102.09918

Zobrazit plný text záznamu

Report

Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Autor: Shivakumar, Prashanth Gurunath, Georgiou, Panayiotis, Narayanan, Shrikanth

Word vector representations enable machines to encode human language for spoken language understanding and processing. Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities pre

Externí odkaz: http://arxiv.org/abs/2102.02270

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání