Výsledky vyhledávání - "kumar, Anurag"

Report

Scaling Concept With Text-Guided Diffusion Models

Autor: Huang, Chao, Liang, Susan, Tang, Yunlong, Tian, Yapeng, Kumar, Anurag, Xu, Chenliang

Text-guided diffusion models have revolutionized generative tasks by producing high-fidelity content from text descriptions. They have also enabled an editing paradigm where concepts can be replaced through text conditioning (e.g., a dog to a tiger).

Externí odkaz: http://arxiv.org/abs/2410.24151

Zobrazit plný text záznamu

Report

Using RLHF to align speech enhancement approaches to mean-opinion quality scores

Autor: Kumar, Anurag, Perrault, Andrew, Williamson, Donald S.

Objective speech quality measures are typically used to assess speech enhancement algorithms, but it has been shown that they are sub-optimal as learning objectives because they do not always align well with human subjective ratings. This misalignmen

Externí odkaz: http://arxiv.org/abs/2410.13182

Zobrazit plný text záznamu

Report

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation

Autor: Liang, Susan, Huang, Chao, Tian, Yapeng, Kumar, Anurag, Xu, Chenliang

In this paper, we introduce a novel task called language-guided joint audio-visual editing. Given an audio and image pair of a sounding event, this task aims at generating new audio-visual content by editing the given sounding event conditioned on th

Externí odkaz: http://arxiv.org/abs/2410.07463

Zobrazit plný text záznamu

Report

Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting

Autor: Mitchell, Daniel A., Rafaely, Boaz, Kumar, Anurag, Tourbabin, Vladimir

Direction-of-arrival estimation of multiple speakers in a room is an important task for a wide range of applications. In particular, challenging environments with moving speakers, reverberation and noise, lead to significant performance degradation f

Externí odkaz: http://arxiv.org/abs/2409.14346

Zobrazit plný text záznamu

Report

Non-verbal Hands-free Control for Smart Glasses using Teeth Clicks

Autor: Mohapatra, Payal, Aroudi, Ali, Kumar, Anurag, Khaleghimeybodi, Morteza

Smart glasses are emerging as a popular wearable computing platform potentially revolutionizing the next generation of human-computer interaction. The widespread adoption of smart glasses has created a pressing need for discreet and hands-free contro

Externí odkaz: http://arxiv.org/abs/2408.11346

Zobrazit plný text záznamu

Report

Utility Optimal Scheduling with a Slow Time-Scale Index-Bias for Achieving Rate Guarantees in Cellular Networks

Autor: Kumar, Anurag, Sundaresan, Rajesh

One of the requirements of network slicing in 5G networks is RAN (radio access network) scheduling with rate guarantees. We study a three-time-scale algorithm for maximum sum utility scheduling, with minimum rate constraints. As usual, the scheduler

Externí odkaz: http://arxiv.org/abs/2408.09182

Zobrazit plný text záznamu

Report

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

Autor: Yun, Heeseung, Gao, Ruohan, Ananthabhotla, Ishwarya, Kumar, Anurag, Donley, Jacob, Li, Chao, Kim, Gunhee, Ithapu, Vamsi Krishna, Murdock, Calvin

Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to behavioral interaction. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representation, which

Externí odkaz: http://arxiv.org/abs/2408.05364

Zobrazit plný text záznamu

Report

High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching

Autor: Lan, Gael Le, Shi, Bowen, Ni, Zhaoheng, Srinivasan, Sidd, Kumar, Anurag, Ellis, Brian, Kant, David, Nagaraja, Varun, Chang, Ernie, Hsu, Wei-Ning, Shi, Yangyang, Chandra, Vikas

We introduce MelodyFlow, an efficient text-controllable high-fidelity music generation and editing model. It operates on continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec. Based on a diffusion transf

Externí odkaz: http://arxiv.org/abs/2407.03648

Zobrazit plný text záznamu

Report

AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

Autor: Kalkhorani, Vahid Ahmadi, Yu, Cheng, Kumar, Anurag, Tan, Ke, Xu, Buye, Wang, DeLiang

Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet

Externí odkaz: http://arxiv.org/abs/2406.11619

Zobrazit plný text záznamu

Report

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

Autor: Zhang, Wangyou, Scheibler, Robin, Saijo, Kohei, Cornell, Samuele, Li, Chenda, Ni, Zhaoheng, Kumar, Anurag, Pirklbauer, Jan, Sach, Marvin, Watanabe, Shinji, Fingscheidt, Tim, Qian, Yanmin

The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this

Externí odkaz: http://arxiv.org/abs/2406.04660

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání