Výsledky vyhledávání

Report

DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot Speech Synthesis via Direct Metric Optimization

Autor: Li, Yingahao Aaron, Kumar, Rithesh, Jin, Zeyu

Diffusion models have demonstrated significant potential in speech synthesis tasks, including text-to-speech (TTS) and voice cloning. However, their iterative denoising processes are inefficient and hinder the application of end-to-end optimization w

Externí odkaz: http://arxiv.org/abs/2410.11097

Zobrazit plný text záznamu

Report

Code Drift: Towards Idempotent Neural Audio Codecs

Autor: O'Reilly, Patrick, Seetharaman, Prem, Su, Jiaqi, Jin, Zeyu, Pardo, Bryan

Neural codecs have demonstrated strong performance in high-fidelity compression of audio signals at low bitrates. The token-based representations produced by these codecs have proven particularly useful for generative modeling. While much research ha

Externí odkaz: http://arxiv.org/abs/2410.11025

Zobrazit plný text záznamu

Report

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Autor: Zhou, Yixuan, Qin, Xiaoyu, Jin, Zeyu, Zhou, Shuoyi, Lei, Shun, Zhou, Songtao, Wu, Zhiyong, Jia, Jia

Recent AIGC systems possess the capability to generate digital multimedia content based on human language instructions, such as text, image and video. However, when it comes to speech, existing methods related to human instruction-to-speech generatio

Externí odkaz: http://arxiv.org/abs/2408.15676

Zobrazit plný text záznamu

Report

Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation

Autor: Chen, Ke, Su, Jiaqi, Berg-Kirkpatrick, Taylor, Dubnov, Shlomo, Jin, Zeyu

Achieving robust speech separation for overlapping speakers in various acoustic environments with noise and reverberation remains an open challenge. Although existing datasets are available to train separators for specific scenarios, they do not effe

Externí odkaz: http://arxiv.org/abs/2408.16126

Zobrazit plný text záznamu

Report

SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description

Autor: Jin, Zeyu, Jia, Jia, Wang, Qixin, Li, Kehan, Zhou, Shuoyi, Zhou, Songtao, Qin, Xiaoyu, Wu, Zhiyong

Speech-language multi-modal learning presents a significant challenge due to the fine nuanced information inherent in speech styles. Therefore, a large-scale dataset providing elaborate comprehension of speech style is urgently needed to facilitate i

Externí odkaz: http://arxiv.org/abs/2408.13608

Zobrazit plný text záznamu

Report

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

Autor: Ghosh, Sreyan, Evuru, Chandra Kiran Reddy, Kumar, Sonal, Tyagi, Utkarsh, Nieto, Oriol, Jin, Zeyu, Manocha, Dinesh

Large Vision-Language Models (LVLMs) often produce responses that misalign with factual information, a phenomenon known as hallucinations. While hallucinations are well-studied, the exact causes behind them remain underexplored. In this paper, we fir

Externí odkaz: http://arxiv.org/abs/2405.15683

Zobrazit plný text záznamu

Report

A Closer Look at the Limitations of Instruction Tuning

Autor: Ghosh, Sreyan, Evuru, Chandra Kiran Reddy, Kumar, Sonal, S, Ramaneswaran, Aneja, Deepali, Jin, Zeyu, Duraiswami, Ramani, Manocha, Dinesh

Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved

Externí odkaz: http://arxiv.org/abs/2402.05119

Zobrazit plný text záznamu

Report

Lax Equivalence for Hyperbolic Relaxation Approximations

Autor: Jin, Zeyu, Li, Ruo

This paper investigates the zero relaxation limit for general linear hyperbolic relaxation systems and establishes the asymptotic convergence of slow variables under the unimprovable weakest stability condition, akin to the Lax equivalence theorem fo

Externí odkaz: http://arxiv.org/abs/2311.10662

Zobrazit plný text záznamu

Report

Natural Model Reduction for Kinetic Equations

Autor: Jin, Zeyu, Li, Ruo

A promising approach to investigating high-dimensional problems is to identify their intrinsically low-dimensional features, which can be achieved through recently developed techniques for effective low-dimensional representation of functions such as

Externí odkaz: http://arxiv.org/abs/2310.12799

Zobrazit plný text záznamu

Report

Efficient Spoken Language Recognition via Multilabel Classification

Autor: Nieto, Oriol, Jin, Zeyu, Dernoncourt, Franck, Salamon, Justin

Spoken language recognition (SLR) is the task of automatically identifying the language present in a speech signal. Existing SLR models are either too computationally expensive or too large to run effectively on devices with limited resources. For re

Externí odkaz: http://arxiv.org/abs/2306.01945

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání