Výsledky vyhledávání - "Nagaraja, Varun"

Report

High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching

Autor: Lan, Gael Le, Shi, Bowen, Ni, Zhaoheng, Srinivasan, Sidd, Kumar, Anurag, Ellis, Brian, Kant, David, Nagaraja, Varun, Chang, Ernie, Hsu, Wei-Ning, Shi, Yangyang, Chandra, Vikas

We introduce MelodyFlow, an efficient text-controllable high-fidelity music generation and editing model. It operates on continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec. Based on a diffusion transf

Externí odkaz: http://arxiv.org/abs/2407.03648

Zobrazit plný text záznamu

Report

On The Open Prompt Challenge In Conditional Audio Generation

Autor: Chang, Ernie, Srinivasan, Sidd, Luthra, Mahi, Lin, Pin-Jie, Nagaraja, Varun, Iandola, Forrest, Liu, Zechun, Ni, Zhaoheng, Zhao, Changsheng, Shi, Yangyang, Chandra, Vikas

Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compare

Externí odkaz: http://arxiv.org/abs/2311.00897

Zobrazit plný text záznamu

Report

FoleyGen: Visually-Guided Audio Generation

Autor: Mei, Xinhao, Nagaraja, Varun, Lan, Gael Le, Ni, Zhaoheng, Chang, Ernie, Shi, Yangyang, Chandra, Vikas

Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues to be a challenge, principally because of the intric

Externí odkaz: http://arxiv.org/abs/2309.10537

Zobrazit plný text záznamu

Report

Stack-and-Delay: a new codebook pattern for music generation

Autor: Lan, Gael Le, Nagaraja, Varun, Chang, Ernie, Kant, David, Ni, Zhaoheng, Shi, Yangyang, Iandola, Forrest, Chandra, Vikas

In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In particular, fla

Externí odkaz: http://arxiv.org/abs/2309.08804

Zobrazit plný text záznamu

Report

Enhance audio generation controllability through representation similarity regularization

Autor: Shi, Yangyang, Lan, Gael Le, Nagaraja, Varun, Ni, Zhaoheng, Mei, Xinhao, Chang, Ernie, Iandola, Forrest, Liu, Yang, Chandra, Vikas

This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverage

Externí odkaz: http://arxiv.org/abs/2309.08773

Zobrazit plný text záznamu

Report

Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution

Autor: Shi, Yangyang, Wu, Chunyang, Wang, Dilin, Xiao, Alex, Mahadeokar, Jay, Zhang, Xiaohui, Liu, Chunxi, Li, Ke, Shangguan, Yuan, Nagaraja, Varun, Kalinli, Ozlem, Seltzer, Mike

This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution. Many works apply the causal convolution to improve streaming transformer ignoring the lookahead context. We propose to use non-causal con

Externí odkaz: http://arxiv.org/abs/2110.05241

Zobrazit plný text záznamu

Report

Collaborative Training of Acoustic Encoders for Speech Recognition

Autor: Nagaraja, Varun, Shi, Yangyang, Venkatesh, Ganesh, Kalinli, Ozlem, Seltzer, Michael L., Chandra, Vikas

On-device speech recognition requires training models of different sizes for deploying on devices with various computational budgets. When building such different models, we can benefit from training them jointly to take advantage of the knowledge sh

Externí odkaz: http://arxiv.org/abs/2106.08960

Zobrazit plný text záznamu

Report

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

Autor: Shi, Yangyang, Nagaraja, Varun, Wu, Chunyang, Mahadeokar, Jay, Le, Duc, Prabhavalkar, Rohit, Xiao, Alex, Yeh, Ching-Feng, Chan, Julian, Fuegen, Christian, Kalinli, Ozlem, Seltzer, Michael L.

We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or finetuning. To trading off accuracy and latency, DET assigns differen

Externí odkaz: http://arxiv.org/abs/2104.02176

Zobrazit plný text záznamu

Report

Modeling Context Between Objects for Referring Expression Understanding

Autor: Nagaraja, Varun K., Morariu, Vlad I., Davis, Larry S.

Referring expressions usually describe an object using properties of the object and relationships of the object with other objects. We propose a technique that integrates context between objects to understand referring expressions. Our approach uses

Externí odkaz: http://arxiv.org/abs/1608.00525

Zobrazit plný text záznamu

Report

Searching for Objects using Structure in Indoor Scenes

Autor: Nagaraja, Varun K., Morariu, Vlad I., Davis, Larry S.

To identify the location of objects of a particular class, a passive computer vision system generally processes all the regions in an image to finally output few regions. However, we can use structure in the scene to search for objects without proces

Externí odkaz: http://arxiv.org/abs/1511.07710

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání