Zobrazeno 1 - 10
of 27
pro vyhledávání: '"Long, Fuchen"'
Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when shaping diffusion models to animate static image (i.e., image-to-video generation). The difficulty o
Externí odkaz:
http://arxiv.org/abs/2403.17005
Diffusion models are just at a tipping point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-reso
Externí odkaz:
http://arxiv.org/abs/2403.17000
The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video event occurri
Externí odkaz:
http://arxiv.org/abs/2401.01256
Video temporal dynamics is conventionally modeled with 3D spatial-temporal kernel or its factorized version comprised of 2D spatial kernel and 1D temporal kernel. The modeling power, nevertheless, is limited by the fixed window size and static weight
Externí odkaz:
http://arxiv.org/abs/2211.08252
The leverage of large volumes of web videos paired with the searched queries or surrounding texts (e.g., title) offers an economic and extensible alternative to supervised video representation learning. Nevertheless, modeling such weakly visual-textu
Externí odkaz:
http://arxiv.org/abs/2206.10491
Motion, as the uniqueness of a video, has been critical to the development of video understanding models. Modern deep learning models leverage motion by either executing spatio-temporal 3D convolutions, factorizing 3D convolutions into spatial and te
Externí odkaz:
http://arxiv.org/abs/2206.06931
Autor:
Pan, Yingwei, Li, Yehao, Zhang, Yiheng, Cai, Qi, Long, Fuchen, Qiu, Zhaofan, Yao, Ting, Mei, Tao
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstr
Externí odkaz:
http://arxiv.org/abs/2206.06289
With the knowledge of action moments (i.e., trimmed video clips that each contains an action instance), humans could routinely localize an action temporally in an untrimmed video. Nevertheless, most practical methods still require all training videos
Externí odkaz:
http://arxiv.org/abs/2008.13705
Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal l
Externí odkaz:
http://arxiv.org/abs/1909.03877
This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019. Specifically, we exploit person/vehicle detections in spatial level and actio
Externí odkaz:
http://arxiv.org/abs/1906.08547