Výsledky vyhledávání - "Xing, Jiazheng"

Report

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

Autor: Xing, Jiazheng, Xu, Chao, Qian, Yijie, Liu, Yang, Dai, Guang, Sun, Baigui, Liu, Yong, Wang, Jingdong

Virtual try-on focuses on adjusting the given clothes to fit a specific person seamlessly while avoiding any distortion of the patterns and textures of the garment. However, the clothing identity uncontrollability and training inefficiency of existin

Externí odkaz: http://arxiv.org/abs/2404.00878

Zobrazit plný text záznamu

Report

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Autor: Hou, Xiaojun, Xing, Jiazheng, Qian, Yijie, Guo, Yaowei, Xin, Shuo, Chen, Junhao, Tang, Kai, Wang, Mengmeng, Jiang, Zhengkai, Liu, Liang, Liu, Yong

Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity o

Externí odkaz: http://arxiv.org/abs/2403.16002

Zobrazit plný text záznamu

Report

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

Autor: Xu, Chao, Liu, Yang, Xing, Jiazheng, Wang, Weida, Sun, Mingze, Dan, Jun, Huang, Tianxin, Li, Siyuan, Cheng, Zhi-Qi, Tai, Ying, Sun, Baigui

In this paper, we abstract the process of people hearing speech, extracting meaningful cues, and creating various dynamically audio-consistent talking faces, termed Listening and Imagining, into the task of high-fidelity diverse talking faces generat

Externí odkaz: http://arxiv.org/abs/2403.01901

Zobrazit plný text záznamu

Report

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

Autor: Wang, Mengmeng, Xing, Jiazheng, Jiang, Boyuan, Chen, Jun, Mei, Jianbiao, Zuo, Xingxing, Dai, Guang, Wang, Jingdong, Liu, Yong

Publikováno v: AAAI2024

Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches

Externí odkaz: http://arxiv.org/abs/2401.11649

Zobrazit plný text záznamu

Report

Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching

Autor: Xing, Jiazheng, Wang, Mengmeng, Ruan, Yudi, Chen, Bofan, Guo, Yaowei, Mu, Boyu, Dai, Guang, Wang, Jingdong, Liu, Yong

Class prototype construction and matching are core aspects of few-shot action recognition. Previous methods mainly focus on designing spatiotemporal relation modeling modules or complex temporal alignment algorithms. Despite the promising results, th

Externí odkaz: http://arxiv.org/abs/2308.09346

Zobrazit plný text záznamu

Report

Multimodal Adaptation of CLIP for Few-Shot Action Recognition

Autor: Xing, Jiazheng, Wang, Mengmeng, Hou, Xiaojun, Dai, Guang, Wang, Jingdong, Liu, Yong

Applying large-scale pre-trained visual models like CLIP to few-shot action recognition tasks can benefit performance and efficiency. Utilizing the "pre-training, fine-tuning" paradigm makes it possible to avoid training a network from scratch, which

Externí odkaz: http://arxiv.org/abs/2308.01532

Zobrazit plný text záznamu

Report

Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition

Autor: Xing, Jiazheng, Wang, Mengmeng, Liu, Yong, Mu, Boyu

Spatial and temporal modeling is one of the most core aspects of few-shot action recognition. Most previous works mainly focus on long-term temporal relation modeling based on high-level spatial representations, without considering the crucial low-le

Externí odkaz: http://arxiv.org/abs/2301.07944

Zobrazit plný text záznamu

Report

ActionCLIP: A New Paradigm for Video Action Recognition

Autor: Wang, Mengmeng, Xing, Jiazheng, Liu, Yong

The canonical approach to video action recognition dictates a neural model to do a classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined categories, limiting their transferable ability on new datasets w

Externí odkaz: http://arxiv.org/abs/2109.08472

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání