Zobrazeno 1 - 10
of 17
pro vyhledávání: '"Pu, Junfu"'
Autor:
Wang, Jiangshan, Pu, Junfu, Qi, Zhongang, Guo, Jiayi, Ma, Yue, Huang, Nisha, Chen, Yuxin, Li, Xiu, Shan, Ying
Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have demonstrated exceptional performance in the field of image and video generation. Despite their robust generative capabilities, these models often suffer from inaccurate inve
Externí odkaz:
http://arxiv.org/abs/2411.04746
Autor:
Tan, Chaolei, Lin, Zihang, Pu, Junfu, Qi, Zhongang, Pei, Wei-Yi, Qu, Zhi, Wang, Yexin, Shan, Ying, Zheng, Wei-Shi, Hu, Jian-Fang
Video grounding is a fundamental problem in multimodal content understanding, aiming to localize specific natural language queries in an untrimmed video. However, current video grounding datasets merely focus on simple events and are either limited t
Externí odkaz:
http://arxiv.org/abs/2408.01669
Autor:
Chen, Yuxin, Ma, Zongyang, Zhang, Ziqi, Qi, Zhongang, Yuan, Chunfeng, Li, Bing, Pu, Junfu, Shan, Ying, Qi, Xiaojuan, Hu, Weiming
Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency. Distilling cross-modality matching knowledge from cross-encoder to
Externí odkaz:
http://arxiv.org/abs/2407.07479
Autor:
Pu, Junfu, Shan, Ying
In this paper, we propose a novel framework for music-driven dance motion synthesis with controllable key pose constraint. In contrast to methods that generate dance motion sequences only based on music without any other controllable conditions, this
Externí odkaz:
http://arxiv.org/abs/2207.03682
Although audio-visual representation has been proved to be applicable in many downstream tasks, the representation of dancing videos, which is more specific and always accompanied by music with complex auditory contents, remains challenging and uninv
Externí odkaz:
http://arxiv.org/abs/2207.03190
Despite existing pioneering works on sign language translation (SLT), there is a non-trivial obstacle, i.e., the limited quantity of parallel sign-text data. To tackle this parallel data bottleneck, we propose a sign back-translation (SignBT) approac
Externí odkaz:
http://arxiv.org/abs/2105.12397
Continuous sign language recognition (SLR) deals with unaligned video-text pair and uses the word error rate (WER), i.e., edit distance, as the main evaluation metric. Since it is not differentiable, we usually instead optimize the learning model wit
Externí odkaz:
http://arxiv.org/abs/2010.05264
Sign language recognition (SLR) is a challenging problem, involving complex manual features, i.e., hand gestures, and fine-grained non-manual features (NMFs), i.e., facial expression, mouth shapes, etc. Although manual features are dominant, non-manu
Externí odkaz:
http://arxiv.org/abs/2008.10428
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.