Výsledky vyhledávání - "Fang, Minghui"

Report

Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling

Autor: Fang, Xinyue, Huang, Zhen, Tian, Zhiliang, Fang, Minghui, Pan, Ziyi, Fang, Quntian, Wen, Zhihua, Pan, Hengyue, Li, Dongsheng

LLMs obtain remarkable performance but suffer from hallucinations. Most research on detecting hallucination focuses on the questions with short and concrete correct answers that are easy to check the faithfulness. Hallucination detections for text ge

Externí odkaz: http://arxiv.org/abs/2409.11283

Zobrazit plný text záznamu

Report

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Autor: Ji, Shengpeng, Jiang, Ziyue, Cheng, Xize, Chen, Yifu, Fang, Minghui, Zuo, Jialong, Yang, Qian, Li, Ruiqi, Zhang, Ziang, Yang, Xiaoda, Huang, Rongjie, Jiang, Yidi, Chen, Qian, Zheng, Siqi, Wang, Wen, Zhao, Zhou

Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional

Externí odkaz: http://arxiv.org/abs/2408.16532

Zobrazit plný text záznamu

Report

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

Autor: Fang, Minghui, Ji, Shengpeng, Zuo, Jialong, Huang, Hai, Xia, Yan, Zhu, Jieming, Cheng, Xize, Yang, Xiaoda, Liu, Wenrui, Wang, Gang, Dong, Zhenhua, Zhao, Zhou

Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity betw

Externí odkaz: http://arxiv.org/abs/2406.17507

Zobrazit plný text záznamu

Report

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Autor: Ji, Shengpeng, Zuo, Jialong, Fang, Minghui, Zheng, Siqi, Chen, Qian, Wang, Wen, Jiang, Ziyue, Huang, Hai, Cheng, Xize, Huang, Rongjie, Zhao, Zhou

In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual st

Externí odkaz: http://arxiv.org/abs/2406.01205

Zobrazit plný text záznamu

Report

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

Autor: Ji, Shengpeng, Fang, Minghui, Jiang, Ziyue, Zheng, Siqi, Chen, Qian, Huang, Rongjie, Zuo, Jialung, Wang, Shulei, Zhao, Zhou

In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acou

Externí odkaz: http://arxiv.org/abs/2402.12208

Zobrazit plný text záznamu

Report

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

Autor: Ji, Shengpeng, Zuo, Jialong, Fang, Minghui, Jiang, Ziyue, Chen, Feiyang, Duan, Xinyu, Huai, Baoxing, Zhao, Zhou

Publikováno v: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recently, there has been a growing interest in the field of controllable Text-to-Speech (TTS). While previous studies have relied on users providing specific style factor values based on acoustic knowledge or selecting reference speeches that meet ce

Externí odkaz: http://arxiv.org/abs/2308.14430

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání