Zobrazeno 1 - 10
of 952
pro vyhledávání: '"Kumar Anurag"'
Text-guided diffusion models have revolutionized generative tasks by producing high-fidelity content from text descriptions. They have also enabled an editing paradigm where concepts can be replaced through text conditioning (e.g., a dog to a tiger).
Externí odkaz:
http://arxiv.org/abs/2410.24151
Objective speech quality measures are typically used to assess speech enhancement algorithms, but it has been shown that they are sub-optimal as learning objectives because they do not always align well with human subjective ratings. This misalignmen
Externí odkaz:
http://arxiv.org/abs/2410.13182
In this paper, we introduce a novel task called language-guided joint audio-visual editing. Given an audio and image pair of a sounding event, this task aims at generating new audio-visual content by editing the given sounding event conditioned on th
Externí odkaz:
http://arxiv.org/abs/2410.07463
Direction-of-arrival estimation of multiple speakers in a room is an important task for a wide range of applications. In particular, challenging environments with moving speakers, reverberation and noise, lead to significant performance degradation f
Externí odkaz:
http://arxiv.org/abs/2409.14346
Smart glasses are emerging as a popular wearable computing platform potentially revolutionizing the next generation of human-computer interaction. The widespread adoption of smart glasses has created a pressing need for discreet and hands-free contro
Externí odkaz:
http://arxiv.org/abs/2408.11346
Autor:
Kumar, Anurag, Sundaresan, Rajesh
One of the requirements of network slicing in 5G networks is RAN (radio access network) scheduling with rate guarantees. We study a three-time-scale algorithm for maximum sum utility scheduling, with minimum rate constraints. As usual, the scheduler
Externí odkaz:
http://arxiv.org/abs/2408.09182
Autor:
Yun, Heeseung, Gao, Ruohan, Ananthabhotla, Ishwarya, Kumar, Anurag, Donley, Jacob, Li, Chao, Kim, Gunhee, Ithapu, Vamsi Krishna, Murdock, Calvin
Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to behavioral interaction. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representation, which
Externí odkaz:
http://arxiv.org/abs/2408.05364
Autor:
Lan, Gael Le, Shi, Bowen, Ni, Zhaoheng, Srinivasan, Sidd, Kumar, Anurag, Ellis, Brian, Kant, David, Nagaraja, Varun, Chang, Ernie, Hsu, Wei-Ning, Shi, Yangyang, Chandra, Vikas
We introduce MelodyFlow, an efficient text-controllable high-fidelity music generation and editing model. It operates on continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec. Based on a diffusion transf
Externí odkaz:
http://arxiv.org/abs/2407.03648
Autor:
Priya Parul, Kumar Anurag
Publikováno v:
Gender Studies, Vol 19, Iss 1, Pp 137-156 (2020)
The Supreme Court of India recently decriminalized section 377 of the Indian Penal Code to outlaw the unfair violence and discrimination against transgender people. The paper argues that despite the legal acceptance of Section 377, the discrimination
Externí odkaz:
https://doaj.org/article/bcaf27c318ed45eea1d04d1e68c0bd25
Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet
Externí odkaz:
http://arxiv.org/abs/2406.11619