Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Yin, Dacheng"'
Autor:
Wang, Yanhui, Bao, Jianmin, Weng, Wenming, Feng, Ruoyu, Yin, Dacheng, Yang, Tao, Zhang, Jingxu, Zhao, Qi Dai Zhiyuan, Wang, Chunyu, Qiu, Kai, Yuan, Yuhui, Tang, Chuanxin, Sun, Xiaoyan, Luo, Chong, Guo, Baining
We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy whi
Externí odkaz:
http://arxiv.org/abs/2311.18829
This paper explores the connection between learning trajectories of Deep Neural Networks (DNNs) and their generalization capabilities when optimized using (stochastic) gradient descent algorithms. Instead of concentrating solely on the generalization
Externí odkaz:
http://arxiv.org/abs/2304.12579
Filler words like ``um" or ``uh" are common in spontaneous speech. It is desirable to automatically detect and remove them in recordings, as they affect the fluency, confidence, and professionalism of speech. Previous studies and our preliminary expe
Externí odkaz:
http://arxiv.org/abs/2304.05922
In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details. TridentSE maintains T-F bin level representation to capture details, and uses a sma
Externí odkaz:
http://arxiv.org/abs/2210.12995
Autor:
Yin, Dacheng, Tang, Chuanxin, Liu, Yanqing, Wang, Xiaoqiang, Zhao, Zhiyuan, Zhao, Yucheng, Xiong, Zhiwei, Zhao, Sheng, Luo, Chong
This paper proposes a new "decompose-and-edit" paradigm for the text-based speech insertion task that facilitates arbitrary-length speech insertion and even full sentence generation. In the proposed paradigm, global and local factors in speech are ex
Externí odkaz:
http://arxiv.org/abs/2206.13865
This paper addresses the unsupervised learning of content-style decomposed representation. We first give a definition of style and then model the content-style representation as a token-level bipartite graph. An unsupervised framework, named Retrieve
Externí odkaz:
http://arxiv.org/abs/2202.12307
Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript. Existing methods adopt a two-stage approach: synthesize the input text
Externí odkaz:
http://arxiv.org/abs/2109.05426
General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Autor:
Zhao, Yucheng, Yin, Dacheng, Luo, Chong, Zhao, Zhiyuan, Tang, Chuanxin, Zeng, Wenjun, Zha, Zheng-Jun
This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning. In the design of MGF, speech hierarchy is taken into consideration. Specifically, we propose to use generative learning approache
Externí odkaz:
http://arxiv.org/abs/2102.01930
Time-frequency (T-F) domain masking is a mainstream approach for single-channel speech enhancement. Recently, focuses have been put to phase prediction in addition to amplitude prediction. In this paper, we propose a phase-and-harmonics-aware deep ne
Externí odkaz:
http://arxiv.org/abs/1911.04697
Publikováno v:
In Journal of Visual Communication and Image Representation November 2022 89