Zobrazeno 1 - 10
of 1 486
pro vyhledávání: '"Yan Shuicheng"'
Autor:
Zheng, Longtao, Zhang, Yifan, Guo, Hanzhong, Pan, Jiachun, Tan, Zhenxiong, Lu, Jiahao, Tang, Chuanxin, An, Bo, Yan, Shuicheng
Recent advances in video diffusion models have unlocked new potential for realistic audio-driven talking video generation. However, achieving seamless audio-lip synchronization, maintaining long-term identity consistency, and producing natural, audio
Externí odkaz:
http://arxiv.org/abs/2412.04448
Autor:
Bai, Jinbin, Chow, Wei, Yang, Ling, Li, Xiangtai, Li, Juncheng, Zhang, Hanwang, Yan, Shuicheng
We present HumanEdit, a high-quality, human-rewarded dataset specifically designed for instruction-guided image editing, enabling precise and diverse image manipulations through open-form language instructions. Previous large-scale editing datasets o
Externí odkaz:
http://arxiv.org/abs/2412.04280
This paper presents UniVST, a unified framework for localized video style transfer based on diffusion model. It operates without the need for training, offering a distinct advantage over existing diffusion methods that transfer style across entire vi
Externí odkaz:
http://arxiv.org/abs/2410.20084
The limited context window of contemporary large language models (LLMs) remains a huge barrier to their broader application across various domains. While continual pre-training on long-context data is a straightforward and effective solution, it incu
Externí odkaz:
http://arxiv.org/abs/2410.19318
Autor:
Liu, Chris Yuhao, Zeng, Liang, Liu, Jiacai, Yan, Rui, He, Jujie, Wang, Chaojie, Yan, Shuicheng, Liu, Yang, Zhou, Yahui
In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference
Externí odkaz:
http://arxiv.org/abs/2410.18451
Autor:
Zhang, Xinjie, Liu, Zhening, Zhang, Yifan, Ge, Xingtong, He, Dailan, Xu, Tongda, Wang, Yan, Lin, Zehong, Yan, Shuicheng, Zhang, Jun
4D Gaussian Splatting (4DGS) has recently emerged as a promising technique for capturing complex dynamic 3D scenes with high fidelity. It utilizes a 4D Gaussian representation and a GPU-friendly rasterizer, enabling rapid rendering speeds. Despite it
Externí odkaz:
http://arxiv.org/abs/2410.13613
In this work, we upgrade the multi-head attention mechanism, the core of the Transformer model, to improve efficiency while maintaining or surpassing the previous accuracy level. We show that multi-head attention can be expressed in the summation for
Externí odkaz:
http://arxiv.org/abs/2410.11842
Autor:
Yang, Ling, Yu, Zhaochen, Zhang, Tianjun, Xu, Minkai, Gonzalez, Joseph E., Cui, Bin, Yan, Shuicheng
Large language models (LLMs) like GPT-4, PaLM, and LLaMA have shown significant improvements in various reasoning tasks. However, smaller models such as Llama-3-8B and DeepSeekMath-Base still struggle with complex mathematical reasoning because they
Externí odkaz:
http://arxiv.org/abs/2410.09008
Autor:
Bai, Jinbin, Ye, Tian, Chow, Wei, Song, Enxin, Chen, Qing-Guo, Li, Xiangtai, Dong, Zhen, Zhu, Lei, Yan, Shuicheng
We present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL. By incorporating a comprehensive suite of architectural innovations, advanced po
Externí odkaz:
http://arxiv.org/abs/2410.08261
3D Gaussian splatting (3DGS), known for its groundbreaking performance and efficiency, has become a dominant 3D representation and brought progress to many 3D vision tasks. However, in this work, we reveal a significant security vulnerability that ha
Externí odkaz:
http://arxiv.org/abs/2410.08190