Zobrazeno 1 - 10
of 1 478
pro vyhledávání: '"P. Bhuvana"'
Accurately measuring the geometry and spatially-varying reflectance of real-world objects is a complex task due to their intricate shapes formed by concave features, hollow engravings and diverse surfaces, resulting in inter-reflection and occlusion
Externí odkaz:
http://arxiv.org/abs/2412.09772
Autor:
R., Bhuvana G., Nandi, Anuj
Extragalactic Black Hole X-ray Binaries (BH-XRBs) are the most intriguing X-ray sources as some of them are `home' to the most massive stellar-mass BHs ever found. In this work, we conduct a comprehensive study of three massive, eclipsing extragalact
Externí odkaz:
http://arxiv.org/abs/2411.17047
Autor:
Richardson, Christopher, Sharma, Roshan, Gaur, Neeraj, Haghani, Parisa, Sundar, Anirudh, Ramabhadran, Bhuvana
Zero-shot domain adaptation for dialogue state tracking (DST) remains a challenging problem in task-oriented dialogue (TOD) systems, where models must generalize to target domains unseen at training time. Current large language model approaches for z
Externí odkaz:
http://arxiv.org/abs/2411.00150
Autor:
Biadsy, Fadi, Chen, Youzheng, Elias, Isaac, Kastner, Kyle, Wang, Gary, Rosenberg, Andrew, Ramabhadran, Bhuvana
In this paper, we introduce a zero-shot Voice Transfer (VT) module that can be seamlessly integrated into a multi-lingual Text-to-speech (TTS) system to transfer an individual's voice across languages. Our proposed VT module comprises a speaker-encod
Externí odkaz:
http://arxiv.org/abs/2409.13910
Autor:
Vashishth, Shikhar, Singh, Harman, Bharadwaj, Shikhar, Ganapathy, Sriram, Asawaroengchai, Chulayuth, Audhkhasi, Kartik, Rosenberg, Andrew, Bapna, Ankur, Ramabhadran, Bhuvana
Representing speech as discrete tokens provides a framework for transforming speech into a format that closely resembles text, thus enabling the use of speech as an input to the widely successful large language models (LLMs). Currently, while several
Externí odkaz:
http://arxiv.org/abs/2409.02384
This paper explores speculative speech recognition (SSR), where we empower conventional automatic speech recognition (ASR) with speculation capabilities, allowing the recognizer to run ahead of audio. We introduce a metric for measuring SSR performan
Externí odkaz:
http://arxiv.org/abs/2407.04641
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
In this paper, we focus on addressing the constraints faced when applying LLMs to ASR. Recent works utilize prefixLM-type models, which directly apply speech as a prefix to LLMs for ASR. We have found that optimizing speech prefixes leads to better A
Externí odkaz:
http://arxiv.org/abs/2406.14701
Autor:
Gaur, Neeraj, Agrawal, Rohan, Wang, Gary, Haghani, Parisa, Rosenberg, Andrew, Ramabhadran, Bhuvana
This paper introduces ASTRA, a novel method for improving Automatic Speech Recognition (ASR) through text injection.Unlike prevailing techniques, ASTRA eliminates the need for sampling to match sequence lengths between speech and text modalities. Ins
Externí odkaz:
http://arxiv.org/abs/2406.06664
Autor:
Meng, Zhong, Wu, Zelin, Prabhavalkar, Rohit, Peyser, Cal, Wang, Weiran, Chen, Nanxin, Sainath, Tara N., Ramabhadran, Bhuvana
Publikováno v:
Interspeech 2024, Kos Island, Greece
Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhan
Externí odkaz:
http://arxiv.org/abs/2406.02921