Výsledky vyhledávání

Report

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

Autor: Lan, Gael Le, Shi, Bowen, Ni, Zhaoheng, Srinivasan, Sidd, Kumar, Anurag, Ellis, Brian, Kant, David, Nagaraja, Varun, Chang, Ernie, Hsu, Wei-Ning, Shi, Yangyang, Chandra, Vikas

We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates t

Externí odkaz: http://arxiv.org/abs/2407.03648

Zobrazit plný text záznamu

Report

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Autor: Chen, Changan, Peng, Puyuan, Baid, Ami, Xue, Zihui, Hsu, Wei-Ning, Harwath, David, Grauman, Kristen

Generating realistic audio for human interactions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during tra

Externí odkaz: http://arxiv.org/abs/2406.09272

Zobrazit plný text záznamu

Report

Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning

Autor: Chien, Chung-Ming, Tjandra, Andros, Vyas, Apoorv, Le, Matt, Shi, Bowen, Hsu, Wei-Ning

As the scale of generative models continues to grow, efficient reuse and adaptation of pre-trained models have become crucial considerations. In this work, we propose Voicebox Adapter, a novel approach that integrates fine-grained conditions into a p

Externí odkaz: http://arxiv.org/abs/2406.06251

Zobrazit plný text záznamu

Report

Universal Exact Compression of Differentially Private Mechanisms

Autor: Liu, Yanxiao, Chen, Wei-Ning, Özgür, Ayfer, Li, Cheuk Ting

To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unl

Externí odkaz: http://arxiv.org/abs/2405.20782

Zobrazit plný text záznamu

Report

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Autor: Wang, Andong, Wu, Bo, Chen, Sunli, Chen, Zhenfang, Guan, Haotian, Lee, Wei-Ning, Li, Li Erran, Gan, Chuang

Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or s

Externí odkaz: http://arxiv.org/abs/2405.09713

Zobrazit plný text záznamu

Report

Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy

Autor: Chen, Wei-Ning, Isik, Berivan, Kairouz, Peter, No, Albert, Oh, Sewoong, Xu, Zheng

We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty

Externí odkaz: http://arxiv.org/abs/2405.02341

Zobrazit plný text záznamu

Report

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Autor: Majumder, Navonil, Hung, Chia-Yu, Ghosal, Deepanway, Hsu, Wei-Ning, Mihalcea, Rada, Poria, Soujanya

Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of au

Externí odkaz: http://arxiv.org/abs/2404.09956

Zobrazit plný text záznamu

Report

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

Autor: Han, HyoJung, Anwar, Mohamed, Pino, Juan, Hsu, Wei-Ning, Carpuat, Marine, Shi, Bowen, Wang, Changhan

Speech recognition and translation systems perform poorly on noisy inputs, which are frequent in realistic environments. Augmenting these systems with visual signals has the potential to improve robustness to noise. However, audio-visual (AV) data is

Externí odkaz: http://arxiv.org/abs/2403.14402

Zobrazit plný text záznamu

Report

$L_q$ Lower Bounds on Distributed Estimation via Fisher Information

Autor: Chen, Wei-Ning, Özgür, Ayfer

Van Trees inequality, also known as the Bayesian Cram\'er-Rao lower bound, is a powerful tool for establishing lower bounds for minimax estimation through Fisher information. It easily adapts to different statistical models and often yields tight bou

Externí odkaz: http://arxiv.org/abs/2402.01895

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání