Zobrazeno 1 - 10
of 97
pro vyhledávání: '"Zhou, Daquan"'
The efficacy of video generation models heavily depends on the quality of their training datasets. Most previous video generation models are trained on short video clips, while recently there has been increasing interest in training long video genera
Externí odkaz:
http://arxiv.org/abs/2410.10816
Autor:
Wang, Yuqing, Xiong, Tianwei, Zhou, Daquan, Lin, Zhijie, Zhao, Yang, Kang, Bingyi, Feng, Jiashi, Liu, Xihui
It is desirable but challenging to generate content-rich long videos in the scale of minutes. Autoregressive large language models (LLMs) have achieved great success in generating coherent and long sequences of tokens in the domain of natural languag
Externí odkaz:
http://arxiv.org/abs/2410.02757
For recent diffusion-based generative models, maintaining consistent content across a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new way of self-
Externí odkaz:
http://arxiv.org/abs/2405.01434
Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally large computational and data resources, which hinders
Externí odkaz:
http://arxiv.org/abs/2404.16994
Dialogue state tracking (DST) aims to record user queries and goals during a conversational interaction achieved by maintaining a predefined set of slots and their corresponding values. Current approaches decide slot values opaquely, while humans usu
Externí odkaz:
http://arxiv.org/abs/2403.04656
The recently developed Sora model [1] has exhibited remarkable capabilities in video generation, sparking intense discussions regarding its ability to simulate real-world phenomena. Despite its growing popularity, there is a lack of established metri
Externí odkaz:
http://arxiv.org/abs/2402.17403
Autor:
Ma, Ze, Zhou, Daquan, Yeh, Chun-Hsiao, Wang, Xue-She, Li, Xiuyu, Yang, Huanrui, Dong, Zhen, Keutzer, Kurt, Feng, Jiashi
Creating content with specified identities (ID) has attracted significant interest in the field of generative models. In the field of text-to-image generation (T2I), subject-driven creation has achieved great progress with the identity controlled via
Externí odkaz:
http://arxiv.org/abs/2402.09368
Autor:
Wang, Weimin, Liu, Jiawei, Lin, Zhijie, Yan, Jiangqiao, Chen, Shuo, Low, Chetwin, Hoang, Tuyen, Wu, Jie, Liew, Jun Hao, Yan, Hanshu, Zhou, Daquan, Feng, Jiashi
The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference imag
Externí odkaz:
http://arxiv.org/abs/2401.04468
Autor:
Ma, Rui, Zhou, Qiang, Jin, Yizhu, Zhou, Daquan, Xiao, Bangjun, Li, Xiuyu, Qu, Yi, Singh, Aishani, Keutzer, Kurt, Hu, Jingtong, Xie, Xiaodong, Dong, Zhen, Zhang, Shanghang, Zhou, Shiji
Copyright law confers upon creators the exclusive rights to reproduce, distribute, and monetize their creative works. However, recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement. These technologi
Externí odkaz:
http://arxiv.org/abs/2403.12052
Transformers have astounding representational power but typically consume considerable computation which is quadratic with image resolution. The prevailing Swin transformer reduces computational costs through a local window strategy. However, this st
Externí odkaz:
http://arxiv.org/abs/2312.08614