Zobrazeno 1 - 10
of 212
pro vyhledávání: '"Jin, Zeyu"'
Diffusion models have demonstrated significant potential in speech synthesis tasks, including text-to-speech (TTS) and voice cloning. However, their iterative denoising processes are inefficient and hinder the application of end-to-end optimization w
Externí odkaz:
http://arxiv.org/abs/2410.11097
Neural codecs have demonstrated strong performance in high-fidelity compression of audio signals at low bitrates. The token-based representations produced by these codecs have proven particularly useful for generative modeling. While much research ha
Externí odkaz:
http://arxiv.org/abs/2410.11025
Autor:
Zhou, Yixuan, Qin, Xiaoyu, Jin, Zeyu, Zhou, Shuoyi, Lei, Shun, Zhou, Songtao, Wu, Zhiyong, Jia, Jia
Recent AIGC systems possess the capability to generate digital multimedia content based on human language instructions, such as text, image and video. However, when it comes to speech, existing methods related to human instruction-to-speech generatio
Externí odkaz:
http://arxiv.org/abs/2408.15676
Achieving robust speech separation for overlapping speakers in various acoustic environments with noise and reverberation remains an open challenge. Although existing datasets are available to train separators for specific scenarios, they do not effe
Externí odkaz:
http://arxiv.org/abs/2408.16126
Autor:
Jin, Zeyu, Jia, Jia, Wang, Qixin, Li, Kehan, Zhou, Shuoyi, Zhou, Songtao, Qin, Xiaoyu, Wu, Zhiyong
Speech-language multi-modal learning presents a significant challenge due to the fine nuanced information inherent in speech styles. Therefore, a large-scale dataset providing elaborate comprehension of speech style is urgently needed to facilitate i
Externí odkaz:
http://arxiv.org/abs/2408.13608
Autor:
Ghosh, Sreyan, Evuru, Chandra Kiran Reddy, Kumar, Sonal, Tyagi, Utkarsh, Nieto, Oriol, Jin, Zeyu, Manocha, Dinesh
Large Vision-Language Models (LVLMs) often produce responses that misalign with factual information, a phenomenon known as hallucinations. While hallucinations are well-studied, the exact causes behind them remain underexplored. In this paper, we fir
Externí odkaz:
http://arxiv.org/abs/2405.15683
Autor:
Ghosh, Sreyan, Evuru, Chandra Kiran Reddy, Kumar, Sonal, S, Ramaneswaran, Aneja, Deepali, Jin, Zeyu, Duraiswami, Ramani, Manocha, Dinesh
Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved
Externí odkaz:
http://arxiv.org/abs/2402.05119
This paper investigates the zero relaxation limit for general linear hyperbolic relaxation systems and establishes the asymptotic convergence of slow variables under the unimprovable weakest stability condition, akin to the Lax equivalence theorem fo
Externí odkaz:
http://arxiv.org/abs/2311.10662
A promising approach to investigating high-dimensional problems is to identify their intrinsically low-dimensional features, which can be achieved through recently developed techniques for effective low-dimensional representation of functions such as
Externí odkaz:
http://arxiv.org/abs/2310.12799
Spoken language recognition (SLR) is the task of automatically identifying the language present in a speech signal. Existing SLR models are either too computationally expensive or too large to run effectively on devices with limited resources. For re
Externí odkaz:
http://arxiv.org/abs/2306.01945