Výsledky vyhledávání - "Wang, Zhisheng"

Report

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

Autor: Yang, Peiji, Wang, Fengping, Zhong, Yicheng, Wei, Huawei, Wang, Zhisheng

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into multiple lay

Externí odkaz: http://arxiv.org/abs/2410.15749

Zobrazit plný text záznamu

Report

Geometric Artifact Correction for Symmetric Multi-Linear Trajectory CT: Theory, Method, and Generalization

Autor: Wang, Zhisheng, Sun, Yanxu, Li, Shangyu, Lin, Legeng, Wang, Shunli, Cui, Junning

For extending CT field-of-view to perform non-destructive testing, the Symmetric Multi-Linear trajectory Computed Tomography (SMLCT) has been developed as a successful example of non-standard CT scanning modes. However, inevitable geometric errors ca

Externí odkaz: http://arxiv.org/abs/2408.15069

Zobrazit plný text záznamu

Report

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models

Autor: Li, Weiqin, Yang, Peiji, Zhong, Yicheng, Zhou, Yixuan, Wang, Zhisheng, Wu, Zhiyong, Wu, Xixin, Meng, Helen

Spontaneous style speech synthesis, which aims to generate human-like speech, often encounters challenges due to the scarcity of high-quality data and limitations in model capabilities. Recent language model-based TTS systems can be trained on large,

Externí odkaz: http://arxiv.org/abs/2407.13509

Zobrazit plný text záznamu

Report

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Autor: Wei, Huawei, Yang, Zejun, Wang, Zhisheng

In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. Our methodology is divided into two stages. Initially, we extract 3D intermediate representations from audi

Externí odkaz: http://arxiv.org/abs/2403.17694

Zobrazit plný text záznamu

Report

3D Visibility-aware Generalizable Neural Radiance Fields for Interacting Hands

Autor: Huang, Xuan, Li, Hanhui, Yang, Zejun, Wang, Zhisheng, Liang, Xiaodan

Neural radiance fields (NeRFs) are promising 3D representations for scenes, objects, and humans. However, most existing methods require multi-view inputs and per-scene training, which limits their real-life applications. Moreover, current methods foc

Externí odkaz: http://arxiv.org/abs/2401.00979

Zobrazit plný text záznamu

Report

Monocular 3D Hand Mesh Recovery via Dual Noise Estimation

Autor: Li, Hanhui, Lin, Xiaojian, Huang, Xuan, Yang, Zejun, Wang, Zhisheng, Liang, Xiaodan

Current parametric models have made notable progress in 3D hand pose and shape estimation. However, due to the fixed hand topology and complex hand poses, current models are hard to generate meshes that are aligned with the image well. To tackle this

Externí odkaz: http://arxiv.org/abs/2312.15916

Zobrazit plný text záznamu

Report

OSNet & MNetO: Two Types of General Reconstruction Architectures for Linear Computed Tomography in Multi-Scenarios

Autor: Wang, Zhisheng, Deng, Zihan, Liu, Fenglin, Huang, Yixing, Yu, Haijun, Cui, Junning

Recently, linear computed tomography (LCT) systems have actively attracted attention. To weaken projection truncation and image the region of interest (ROI) for LCT, the backprojection filtration (BPF) algorithm is an effective solution. However, in

Externí odkaz: http://arxiv.org/abs/2309.11858

Zobrazit plný text záznamu

Report

ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment

Autor: Zhong, Yicheng, Wei, Huawei, Yang, Peiji, Wang, Zhisheng

The objective of stylized speech-driven facial animation is to create animations that encapsulate specific emotional expressions. Existing methods often depend on pre-established emotional labels or facial expression templates, which may limit the ne

Externí odkaz: http://arxiv.org/abs/2308.14448

Zobrazit plný text záznamu

Report

LongDanceDiff: Long-term Dance Generation with Conditional Diffusion Model

Autor: Yang, Siqi, Yang, Zejun, Wang, Zhisheng

Dancing with music is always an essential human art form to express emotion. Due to the high temporal-spacial complexity, long-term 3D realist dance generation synchronized with music is challenging. Existing methods suffer from the freezing problem

Externí odkaz: http://arxiv.org/abs/2308.11945

Zobrazit plný text záznamu

Report

Analytical reconstructions of full-scan multiple source-translation computed tomography under large field of views

Autor: Wang, Zhisheng, Liu, Yue, Wang, Shunli, Bian, Xingyuan, Li, Zongfeng, Cui, Junning

This paper is to investigate the high-quality analytical reconstructions of multiple source-translation computed tomography (mSTCT) under an extended field of view (FOV). Under the larger FOVs, the previously proposed backprojection filtration (BPF)

Externí odkaz: http://arxiv.org/abs/2305.19767

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání