Výsledky vyhledávání

Report

Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields

Autor: Yang, Jing, Prasad, Pratusha Bhuvana, Zhang, Qing, Zhao, Yajie

Accurately measuring the geometry and spatially-varying reflectance of real-world objects is a complex task due to their intricate shapes formed by concave features, hollow engravings and diverse surfaces, resulting in inter-reflection and occlusion

Externí odkaz: http://arxiv.org/abs/2412.09772

Zobrazit plný text záznamu

Report

Accretion disc dynamics in extragalactic black hole X-ray binaries: A comprehensive study of M33 X-7, NGC 300 X-1 and IC 10 X-1

Autor: R., Bhuvana G., Nandi, Anuj

Extragalactic Black Hole X-ray Binaries (BH-XRBs) are the most intriguing X-ray sources as some of them are `home' to the most massive stellar-mass BHs ever found. In this work, we conduct a comprehensive study of three massive, eclipsing extragalact

Externí odkaz: http://arxiv.org/abs/2411.17047

Zobrazit plný text záznamu

Report

Schema Augmentation for Zero-Shot Domain Adaptation in Dialogue State Tracking

Autor: Richardson, Christopher, Sharma, Roshan, Gaur, Neeraj, Haghani, Parisa, Sundar, Anirudh, Ramabhadran, Bhuvana

Zero-shot domain adaptation for dialogue state tracking (DST) remains a challenging problem in task-oriented dialogue (TOD) systems, where models must generalize to target domains unseen at training time. Current large language model approaches for z

Externí odkaz: http://arxiv.org/abs/2411.00150

Zobrazit plný text záznamu

Report

Zero-shot Cross-lingual Voice Transfer for TTS

Autor: Biadsy, Fadi, Chen, Youzheng, Elias, Isaac, Kastner, Kyle, Wang, Gary, Rosenberg, Andrew, Ramabhadran, Bhuvana

In this paper, we introduce a zero-shot Voice Transfer (VT) module that can be seamlessly integrated into a multi-lingual Text-to-speech (TTS) system to transfer an individual's voice across languages. Our proposed VT module comprises a speaker-encod

Externí odkaz: http://arxiv.org/abs/2409.13910

Zobrazit plný text záznamu

Report

STAB: Speech Tokenizer Assessment Benchmark

Autor: Vashishth, Shikhar, Singh, Harman, Bharadwaj, Shikhar, Ganapathy, Sriram, Asawaroengchai, Chulayuth, Audhkhasi, Kartik, Rosenberg, Andrew, Bapna, Ankur, Ramabhadran, Bhuvana

Representing speech as discrete tokens provides a framework for transforming speech into a format that closely resembles text, thus enabling the use of speech as an input to the widely successful large language models (LLMs). Currently, while several

Externí odkaz: http://arxiv.org/abs/2409.02384

Zobrazit plný text záznamu

Report

Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models

Autor: Yusuf, Bolaji, Baskar, Murali Karthick, Rosenberg, Andrew, Ramabhadran, Bhuvana

This paper explores speculative speech recognition (SSR), where we empower conventional automatic speech recognition (ASR) with speculation capabilities, allowing the recognizer to run ahead of audio. We introduce a metric for measuring SSR performan

Externí odkaz: http://arxiv.org/abs/2407.04641

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Report

Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions

Autor: Baskar, Murali Karthick, Rosenberg, Andrew, Ramabhadran, Bhuvana, Gaur, Neeraj, Meng, Zhong

In this paper, we focus on addressing the constraints faced when applying LLMs to ASR. Recent works utilize prefixLM-type models, which directly apply speech as a prefix to LLMs for ASR. We have found that optimizing speech prefixes leads to better A

Externí odkaz: http://arxiv.org/abs/2406.14701

Zobrazit plný text záznamu

Report

ASTRA: Aligning Speech and Text Representations for Asr without Sampling

Autor: Gaur, Neeraj, Agrawal, Rohan, Wang, Gary, Haghani, Parisa, Rosenberg, Andrew, Ramabhadran, Bhuvana

This paper introduces ASTRA, a novel method for improving Automatic Speech Recognition (ASR) through text injection.Unlike prevailing techniques, ASTRA eliminates the need for sampling to match sequence lengths between speech and text modalities. Ins

Externí odkaz: http://arxiv.org/abs/2406.06664

Zobrazit plný text záznamu

Report

Text Injection for Neural Contextual Biasing

Autor: Meng, Zhong, Wu, Zelin, Prabhavalkar, Rohit, Peyser, Cal, Wang, Weiran, Chen, Nanxin, Sainath, Tara N., Ramabhadran, Bhuvana

Publikováno v: Interspeech 2024, Kos Island, Greece

Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhan

Externí odkaz: http://arxiv.org/abs/2406.02921

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání