Zobrazeno 1 - 10
of 360
pro vyhledávání: '"Duan, Lixin"'
In the realm of stochastic human motion prediction (SHMP), researchers have often turned to generative models like GANS, VAEs and diffusion models. However, most previous approaches have struggled to accurately predict motions that are both realistic
Externí odkaz:
http://arxiv.org/abs/2407.11494
Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation model
Externí odkaz:
http://arxiv.org/abs/2407.06642
Existing view-based methods excel at recognizing 3D objects from predefined viewpoints, but their exploration of recognition under arbitrary views is limited. This is a challenging and realistic setting because each object has different viewpoint pos
Externí odkaz:
http://arxiv.org/abs/2407.03842
The interactions between human and objects are important for recognizing object-centric actions. Existing methods usually adopt a two-stage pipeline, where object proposals are first detected using a pretrained detector, and then are fed to an action
Externí odkaz:
http://arxiv.org/abs/2404.11903
Autor:
Ge, Yanqi, Liu, Jiaqi, Fan, Qingnan, Jiang, Xi, Huang, Ye, Qin, Shuai, Gu, Hong, Li, Wen, Duan, Lixin
In this work, we target the task of text-driven style transfer in the context of text-to-image (T2I) diffusion models. The main challenge is consistent structure preservation while enabling effective style transfer effects. The past approaches in thi
Externí odkaz:
http://arxiv.org/abs/2404.06835
We introduced SSR, which utilizes SAM (segment-anything) as a strong regularizer during training, to greatly enhance the robustness of the image encoder for handling various domains. Specifically, given the fact that SAM is pre-trained with a large n
Externí odkaz:
http://arxiv.org/abs/2401.14686
Autor:
Ge, Yanqi, Nie, Qiang, Huang, Ye, Liu, Yong, Wang, Chengjie, Zheng, Feng, Li, Wen, Duan, Lixin
One of the ultimate goals of representation learning is to achieve compactness within a class and well-separability between classes. Many outstanding metric-based and prototype-based methods following the Expectation-Maximization paradigm, have been
Externí odkaz:
http://arxiv.org/abs/2312.11872
Unsupervised cross-domain action recognition aims at adapting the model trained on an existing labeled source domain to a new unlabeled target domain. Most existing methods solve the task by directly aligning the feature distributions of source and t
Externí odkaz:
http://arxiv.org/abs/2311.14281
Unsupervised face animation aims to generate a human face video based on the appearance of a source image, mimicking the motion from a driving video. Existing methods typically adopted a prior-based motion model (e.g., the local affine motion model o
Externí odkaz:
http://arxiv.org/abs/2310.13912
Existing pyramid-based upsamplers (e.g. SemanticFPN), although efficient, usually produce less accurate results compared to dilation-based models when using the same backbone. This is partially caused by the contaminated high-level features since the
Externí odkaz:
http://arxiv.org/abs/2303.08646