Zobrazeno 1 - 10
of 8 287
pro vyhledávání: '"Wei, Ning"'
Autor:
Lan, Gael Le, Shi, Bowen, Ni, Zhaoheng, Srinivasan, Sidd, Kumar, Anurag, Ellis, Brian, Kant, David, Nagaraja, Varun, Chang, Ernie, Hsu, Wei-Ning, Shi, Yangyang, Chandra, Vikas
We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates t
Externí odkaz:
http://arxiv.org/abs/2407.03648
Autor:
Chen, Changan, Peng, Puyuan, Baid, Ami, Xue, Zihui, Hsu, Wei-Ning, Harwath, David, Grauman, Kristen
Generating realistic audio for human interactions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during tra
Externí odkaz:
http://arxiv.org/abs/2406.09272
As the scale of generative models continues to grow, efficient reuse and adaptation of pre-trained models have become crucial considerations. In this work, we propose Voicebox Adapter, a novel approach that integrates fine-grained conditions into a p
Externí odkaz:
http://arxiv.org/abs/2406.06251
To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unl
Externí odkaz:
http://arxiv.org/abs/2405.20782
Autor:
Wang, Andong, Wu, Bo, Chen, Sunli, Chen, Zhenfang, Guan, Haotian, Lee, Wei-Ning, Li, Li Erran, Gan, Chuang
Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or s
Externí odkaz:
http://arxiv.org/abs/2405.09713
We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty
Externí odkaz:
http://arxiv.org/abs/2405.02341
Autor:
Majumder, Navonil, Hung, Chia-Yu, Ghosal, Deepanway, Hsu, Wei-Ning, Mihalcea, Rada, Poria, Soujanya
Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of au
Externí odkaz:
http://arxiv.org/abs/2404.09956
Autor:
Han, HyoJung, Anwar, Mohamed, Pino, Juan, Hsu, Wei-Ning, Carpuat, Marine, Shi, Bowen, Wang, Changhan
Speech recognition and translation systems perform poorly on noisy inputs, which are frequent in realistic environments. Augmenting these systems with visual signals has the potential to improve robustness to noise. However, audio-visual (AV) data is
Externí odkaz:
http://arxiv.org/abs/2403.14402
Autor:
Chen, Wei-Ning, Özgür, Ayfer
Van Trees inequality, also known as the Bayesian Cram\'er-Rao lower bound, is a powerful tool for establishing lower bounds for minimax estimation through Fisher information. It easily adapts to different statistical models and often yields tight bou
Externí odkaz:
http://arxiv.org/abs/2402.01895
Autor:
Vyas, Apoorv, Shi, Bowen, Le, Matthew, Tjandra, Andros, Wu, Yi-Chiao, Guo, Baishan, Zhang, Jiemin, Zhang, Xinyue, Adkins, Robert, Ngan, William, Wang, Jeff, Cruz, Ivan, Akula, Bapi, Akinyemi, Akinniyi, Ellis, Brian, Moritz, Rashel, Yungster, Yael, Rakotoarison, Alice, Tan, Liang, Summers, Chris, Wood, Carleigh, Lane, Joshua, Williamson, Mary, Hsu, Wei-Ning
Audio is an essential part of our life, but creating it often requires expertise and is time-consuming. Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single mod
Externí odkaz:
http://arxiv.org/abs/2312.15821