Výsledky vyhledávání - "Yin, Dacheng"

Report

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Autor: Wang, Yanhui, Bao, Jianmin, Weng, Wenming, Feng, Ruoyu, Yin, Dacheng, Yang, Tao, Zhang, Jingxu, Zhao, Qi Dai Zhiyuan, Wang, Chunyu, Qiu, Kai, Yuan, Yuhui, Tang, Chuanxin, Sun, Xiaoyan, Luo, Chong, Guo, Baining

We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy whi

Externí odkaz: http://arxiv.org/abs/2311.18829

Zobrazit plný text záznamu

Report

Learning Trajectories are Generalization Indicators

Autor: Fu, Jingwen, Zhang, Zhizheng, Yin, Dacheng, Lu, Yan, Zheng, Nanning

This paper explores the connection between learning trajectories of Deep Neural Networks (DNNs) and their generalization capabilities when optimized using (stochastic) gradient descent algorithms. Instead of concentrating solely on the generalization

Externí odkaz: http://arxiv.org/abs/2304.12579

Zobrazit plný text záznamu

Report

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Autor: Zhao, Zhiyuan, Wu, Lijun, Tang, Chuanxin, Yin, Dacheng, Zhao, Yucheng, Luo, Chong

Filler words like ``um" or ``uh" are common in spontaneous speech. It is desirable to automatically detect and remove them in recordings, as they affect the fluency, confidence, and professionalism of speech. Previous studies and our preliminary expe

Externí odkaz: http://arxiv.org/abs/2304.05922

Zobrazit plný text záznamu

Report

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

Autor: Yin, Dacheng, Zhao, Zhiyuan, Tang, Chuanxin, Xiong, Zhiwei, Luo, Chong

In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details. TridentSE maintains T-F bin level representation to capture details, and uses a sma

Externí odkaz: http://arxiv.org/abs/2210.12995

Zobrazit plný text záznamu

Report

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

Autor: Yin, Dacheng, Tang, Chuanxin, Liu, Yanqing, Wang, Xiaoqiang, Zhao, Zhiyuan, Zhao, Yucheng, Xiong, Zhiwei, Zhao, Sheng, Luo, Chong

This paper proposes a new "decompose-and-edit" paradigm for the text-based speech insertion task that facilitates arbitrary-length speech insertion and even full sentence generation. In the proposed paradigm, global and local factors in speech are ex

Externí odkaz: http://arxiv.org/abs/2206.13865

Zobrazit plný text záznamu

Report

Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

Autor: Yin, Dacheng, Ren, Xuanchi, Luo, Chong, Wang, Yuwang, Xiong, Zhiwei, Zeng, Wenjun

This paper addresses the unsupervised learning of content-style decomposed representation. We first give a definition of style and then model the content-style representation as a token-level bipartite graph. An unsupervised framework, named Retrieve

Externí odkaz: http://arxiv.org/abs/2202.12307

Zobrazit plný text záznamu

Report

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Autor: Tang, Chuanxin, Luo, Chong, Zhao, Zhiyuan, Yin, Dacheng, Zhao, Yucheng, Zeng, Wenjun

Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript. Existing methods adopt a two-stage approach: synthesize the input text

Externí odkaz: http://arxiv.org/abs/2109.05426

Zobrazit plný text záznamu

Report

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

Autor: Zhao, Yucheng, Yin, Dacheng, Luo, Chong, Zhao, Zhiyuan, Tang, Chuanxin, Zeng, Wenjun, Zha, Zheng-Jun

This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning. In the design of MGF, speech hierarchy is taken into consideration. Specifically, we propose to use generative learning approache

Externí odkaz: http://arxiv.org/abs/2102.01930

Zobrazit plný text záznamu

Report

PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

Autor: Yin, Dacheng, Luo, Chong, Xiong, Zhiwei, Zeng, Wenjun

Time-frequency (T-F) domain masking is a mainstream approach for single-channel speech enhancement. Recently, focuses have been put to phase prediction in addition to amplitude prediction. In this paper, we propose a phase-and-harmonics-aware deep ne

Externí odkaz: http://arxiv.org/abs/1911.04697

Zobrazit plný text záznamu

Akademický článek

Decomposing style, content, and motion for videos

Autor: Hu, Yaosi, Yin, Dacheng, Wang, Yuwang, Chen, Zhenzhong, Luo, Chong

Publikováno v: In Journal of Visual Communication and Image Representation November 2022 89

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání