Zobrazeno 1 - 10
of 567
pro vyhledávání: '"Shi,Yangyang"'
This paper derives the stochastic homogenization for two dimensional Navier--Stokes equations with random coefficients. By means of weak convergence method and Stratonovich--Khasminskii averaging principle approach, the solution of two dimensional Na
Externí odkaz:
http://arxiv.org/abs/2412.12866
Deep learning models like Convolutional Neural Networks and transformers have shown impressive capabilities in speech verification, gaining considerable attention in the research community. However, CNN-based approaches struggle with modeling long-se
Externí odkaz:
http://arxiv.org/abs/2412.10989
Autor:
Liu, Haohe, Lan, Gael Le, Mei, Xinhao, Ni, Zhaoheng, Kumar, Anurag, Nagaraja, Varun, Wang, Wenwu, Plumbley, Mark D., Shi, Yangyang, Chandra, Vikas
Video and audio are closely correlated modalities that humans naturally perceive together. While recent advancements have enabled the generation of audio or video from text, producing both modalities simultaneously still typically relies on either a
Externí odkaz:
http://arxiv.org/abs/2412.15220
Time series forecasting is crucial in many fields, yet current deep learning models struggle with noise, data sparsity, and capturing complex multi-scale patterns. This paper presents MFF-FTNet, a novel framework addressing these challenges by combin
Externí odkaz:
http://arxiv.org/abs/2411.17382
Autor:
Fedorov, Igor, Plawiak, Kate, Wu, Lemeng, Elgamal, Tarek, Suda, Naveen, Smith, Eric, Zhan, Hongyuan, Chi, Jianfeng, Hulovatyy, Yuriy, Patel, Kimish, Liu, Zechun, Zhao, Changsheng, Shi, Yangyang, Blankevoort, Tijmen, Pasupuleti, Mahesh, Soran, Bilge, Coudert, Zacharie Delpierre, Alao, Rachad, Krishnamoorthi, Raghuraman, Chandra, Vikas
This paper presents Llama Guard 3-1B-INT4, a compact and efficient Llama Guard model, which has been open-sourced to the community during Meta Connect 2024. We demonstrate that Llama Guard 3-1B-INT4 can be deployed on resource-constrained devices, ac
Externí odkaz:
http://arxiv.org/abs/2411.17713
Autor:
Zhuge, Mingchen, Zhao, Changsheng, Ashley, Dylan, Wang, Wenyi, Khizbullin, Dmitrii, Xiong, Yunyang, Liu, Zechun, Chang, Ernie, Krishnamoorthi, Raghuraman, Tian, Yuandong, Shi, Yangyang, Chandra, Vikas, Schmidhuber, Jürgen
Contemporary evaluation techniques are inadequate for agentic systems. These approaches either focus exclusively on final outcomes -- ignoring the step-by-step nature of agentic systems, or require excessive manual labour. To address this, we introdu
Externí odkaz:
http://arxiv.org/abs/2410.10934
Autor:
Chang, Ernie, Paltenghi, Matteo, Li, Yang, Lin, Pin-Jie, Zhao, Changsheng, Huber, Patrick, Liu, Zechun, Rabatin, Rastislav, Shi, Yangyang, Chandra, Vikas
Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization. In this paper, we
Externí odkaz:
http://arxiv.org/abs/2410.03083
Autor:
Chang, Ernie, Lin, Pin-Jie, Li, Yang, Zhao, Changsheng, Kim, Daeil, Rabatin, Rastislav, Liu, Zechun, Shi, Yangyang, Chandra, Vikas
Language model pretraining generally targets a broad range of use cases and incorporates data from diverse sources. However, there are instances where we desire a model that excels in specific areas without markedly compromising performance in other
Externí odkaz:
http://arxiv.org/abs/2409.14705
Autor:
Lan, Gael Le, Shi, Bowen, Ni, Zhaoheng, Srinivasan, Sidd, Kumar, Anurag, Ellis, Brian, Kant, David, Nagaraja, Varun, Chang, Ernie, Hsu, Wei-Ning, Shi, Yangyang, Chandra, Vikas
We introduce MelodyFlow, an efficient text-controllable high-fidelity music generation and editing model. It operates on continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec. Based on a diffusion transf
Externí odkaz:
http://arxiv.org/abs/2407.03648
We introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous aud
Externí odkaz:
http://arxiv.org/abs/2406.09569