Výsledky vyhledávání - "Kumar Anurag"

Report

Scaling Concept With Text-Guided Diffusion Models

Autor: Huang, Chao, Liang, Susan, Tang, Yunlong, Tian, Yapeng, Kumar, Anurag, Xu, Chenliang

Text-guided diffusion models have revolutionized generative tasks by producing high-fidelity content from text descriptions. They have also enabled an editing paradigm where concepts can be replaced through text conditioning (e.g., a dog to a tiger).

Externí odkaz: http://arxiv.org/abs/2410.24151

Zobrazit plný text záznamu

Report

Using RLHF to align speech enhancement approaches to mean-opinion quality scores

Autor: Kumar, Anurag, Perrault, Andrew, Williamson, Donald S.

Objective speech quality measures are typically used to assess speech enhancement algorithms, but it has been shown that they are sub-optimal as learning objectives because they do not always align well with human subjective ratings. This misalignmen

Externí odkaz: http://arxiv.org/abs/2410.13182

Zobrazit plný text záznamu

Report

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation

Autor: Liang, Susan, Huang, Chao, Tian, Yapeng, Kumar, Anurag, Xu, Chenliang

In this paper, we introduce a novel task called language-guided joint audio-visual editing. Given an audio and image pair of a sounding event, this task aims at generating new audio-visual content by editing the given sounding event conditioned on th

Externí odkaz: http://arxiv.org/abs/2410.07463

Zobrazit plný text záznamu

Report

Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting

Autor: Mitchell, Daniel A., Rafaely, Boaz, Kumar, Anurag, Tourbabin, Vladimir

Direction-of-arrival estimation of multiple speakers in a room is an important task for a wide range of applications. In particular, challenging environments with moving speakers, reverberation and noise, lead to significant performance degradation f

Externí odkaz: http://arxiv.org/abs/2409.14346

Zobrazit plný text záznamu

Report

Non-verbal Hands-free Control for Smart Glasses using Teeth Clicks

Autor: Mohapatra, Payal, Aroudi, Ali, Kumar, Anurag, Khaleghimeybodi, Morteza

Smart glasses are emerging as a popular wearable computing platform potentially revolutionizing the next generation of human-computer interaction. The widespread adoption of smart glasses has created a pressing need for discreet and hands-free contro

Externí odkaz: http://arxiv.org/abs/2408.11346

Zobrazit plný text záznamu

Report

Utility Optimal Scheduling with a Slow Time-Scale Index-Bias for Achieving Rate Guarantees in Cellular Networks

Autor: Kumar, Anurag, Sundaresan, Rajesh

One of the requirements of network slicing in 5G networks is RAN (radio access network) scheduling with rate guarantees. We study a three-time-scale algorithm for maximum sum utility scheduling, with minimum rate constraints. As usual, the scheduler

Externí odkaz: http://arxiv.org/abs/2408.09182

Zobrazit plný text záznamu

Report

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

Autor: Yun, Heeseung, Gao, Ruohan, Ananthabhotla, Ishwarya, Kumar, Anurag, Donley, Jacob, Li, Chao, Kim, Gunhee, Ithapu, Vamsi Krishna, Murdock, Calvin

Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to behavioral interaction. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representation, which

Externí odkaz: http://arxiv.org/abs/2408.05364

Zobrazit plný text záznamu

Report

High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching

Autor: Lan, Gael Le, Shi, Bowen, Ni, Zhaoheng, Srinivasan, Sidd, Kumar, Anurag, Ellis, Brian, Kant, David, Nagaraja, Varun, Chang, Ernie, Hsu, Wei-Ning, Shi, Yangyang, Chandra, Vikas

We introduce MelodyFlow, an efficient text-controllable high-fidelity music generation and editing model. It operates on continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec. Based on a diffusion transf

Externí odkaz: http://arxiv.org/abs/2407.03648

Zobrazit plný text záznamu

Akademický článek

Social Acceptance and Section 377: A Case Study of Transgender People in Jammu City

Autor: Priya Parul, Kumar Anurag

Publikováno v: Gender Studies, Vol 19, Iss 1, Pp 137-156 (2020)

The Supreme Court of India recently decriminalized section 377 of the Indian Penal Code to outlaw the unfair violence and discrimination against transgender people. The paper argues that despite the legal acceptance of Section 377, the discrimination

Externí odkaz: https://doaj.org/article/bcaf27c318ed45eea1d04d1e68c0bd25

Zobrazit plný text záznamu

Report

AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

Autor: Kalkhorani, Vahid Ahmadi, Yu, Cheng, Kumar, Anurag, Tan, Ke, Xu, Buye, Wang, DeLiang

Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet

Externí odkaz: http://arxiv.org/abs/2406.11619

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání