Výsledky vyhledávání - "Wang, Yuancheng"

Report

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

Autor: He, Haorui, Shang, Zengqiang, Wang, Chaoren, Li, Xuyuan, Gu, Yicheng, Hua, Hua, Liu, Liwei, Yang, Chen, Li, Jiaqi, Shi, Peiyang, Wang, Yuancheng, Chen, Kai, Zhang, Pengyuan, Wu, Zhizheng

Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneou

Externí odkaz: http://arxiv.org/abs/2407.05361

Zobrazit plný text záznamu

Report

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

Autor: Zhang, Yiming, Gu, Yicheng, Zeng, Yanhong, Xing, Zhening, Wang, Yuancheng, Wu, Zhizheng, Chen, Kai

We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to

Externí odkaz: http://arxiv.org/abs/2407.01494

Zobrazit plný text záznamu

Report

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Autor: Ao, Junyi, Wang, Yuancheng, Tian, Xiaohai, Chen, Dekun, Zhang, Jun, Lu, Lu, Wang, Yuxuan, Li, Haizhou, Wu, Zhizheng

Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Ch

Externí odkaz: http://arxiv.org/abs/2406.13340

Zobrazit plný text záznamu

Report

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Autor: Xin, Detai, Tan, Xu, Shen, Kai, Ju, Zeqian, Yang, Dongchao, Wang, Yuancheng, Takamichi, Shinnosuke, Saruwatari, Hiroshi, Liu, Shujie, Li, Jinyu, Zhao, Sheng

We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as

Externí odkaz: http://arxiv.org/abs/2404.03204

Zobrazit plný text záznamu

Report

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Autor: Ju, Zeqian, Wang, Yuancheng, Shen, Kai, Tan, Xu, Xin, Detai, Yang, Dongchao, Liu, Yanqing, Leng, Yichong, Song, Kaitao, Tang, Siliang, Wu, Zhizheng, Qin, Tao, Li, Xiang-Yang, Ye, Wei, Zhang, Shikun, Bian, Jiang, He, Lei, Li, Jinyu, Zhao, Sheng

While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre,

Externí odkaz: http://arxiv.org/abs/2403.03100

Zobrazit plný text záznamu

Akademický článek

Numerical modeling of the horizontal flow and concentration distribution of nitrogen within a stored-paddy bulk in a large warehouse

Autor: Wang, Yuancheng, Li, Fujun, Cao, Yang, Wei, Lei, Cui, Hongying

Publikováno v: Julius-Kühn-Archiv, Vol 463, Iss 1, Pp 395-400 (2018)

The insect population in grain stores can be kept under control by maintaining a high concentration of N2 gas throughout the grain bed. The development of controlled atmosphere storage technology for insect control requires an accurate prediction of

Externí odkaz: https://doaj.org/article/e8e156c6f9c4403881120833582a2c24

Zobrazit plný text záznamu

Report

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Autor: Zhang, Xueyao, Xue, Liumeng, Gu, Yicheng, Wang, Yuancheng, He, Haorui, Wang, Chaoren, Chen, Xi, Fang, Zihao, Chen, Haopeng, Zhang, Junan, Tang, Tze Ying, Zou, Lexiao, Wang, Mingxuan, Han, Jun, Chen, Kai, Li, Haizhou, Wu, Zhizheng

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, wit

Externí odkaz: http://arxiv.org/abs/2312.09911

Zobrazit plný text záznamu

Report

Trustworthy Multi-phase Liver Tumor Segmentation via Evidence-based Uncertainty

Autor: Hu, Chuanfei, Xia, Tianyi, Cui, Ying, Zou, Quchen, Wang, Yuancheng, Xiao, Wenbo, Ju, Shenghong, Li, Xinde

Multi-phase liver contrast-enhanced computed tomography (CECT) images convey the complementary multi-phase information for liver tumor segmentation (LiTS), which are crucial to assist the diagnosis of liver cancer clinically. However, the performance

Externí odkaz: http://arxiv.org/abs/2305.05344

Zobrazit plný text záznamu

Report

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

Autor: Wang, Yuancheng, Ju, Zeqian, Tan, Xu, He, Lei, Wu, Zhizheng, Bian, Jiang, Zhao, Sheng

Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and d

Externí odkaz: http://arxiv.org/abs/2304.00830

Zobrazit plný text záznamu

Report

Automated Testing of Image Captioning Systems

Autor: Yu, Boxi, Zhong, Zhiqing, Qin, Xinran, Yao, Jiayi, Wang, Yuancheng, He, Pinjia

Image captioning (IC) systems, which automatically generate a text description of the salient objects in an image (real or synthetic), have seen great progress over the past few years due to the development of deep neural networks. IC plays an indisp

Externí odkaz: http://arxiv.org/abs/2206.06550

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání