Zobrazeno 1 - 10
of 1 105
pro vyhledávání: '"Ji, Rongrong"'
Recent progress in 3D object generation has been fueled by the strong priors offered by diffusion models. However, existing models are tailored to specific tasks, accommodating only one modality at a time and necessitating retraining to change modali
Externí odkaz:
http://arxiv.org/abs/2411.14715
Autor:
Luo, Yongdong, Zheng, Xiawu, Yang, Xiao, Li, Guilin, Lin, Haojia, Huang, Jinfa, Ji, Jiayi, Chao, Fei, Luo, Jiebo, Ji, Rongrong
Existing large video-language models (LVLMs) struggle to comprehend long videos correctly due to limited context. To address this problem, fine-tuning long-context LVLMs and employing GPT-based agents have emerged as promising solutions. However, fin
Externí odkaz:
http://arxiv.org/abs/2411.13093
This paper makes a step towards modeling the modality discrepancy in the cross-spectral re-identification task. Based on the Lambertain model, we observe that the non-linear modality discrepancy mainly comes from diverse linear transformations acting
Externí odkaz:
http://arxiv.org/abs/2411.01225
In this paper, we propose TextDestroyer, the first training- and annotation-free method for scene text destruction using a pre-trained diffusion model. Existing scene text removal models require complex annotation and retraining, and may leave faint
Externí odkaz:
http://arxiv.org/abs/2411.00355
Despite the significant progress in multimodal large language models (MLLMs), their high computational cost remains a barrier to real-world deployment. Inspired by the mixture of depths (MoDs) in natural language processing, we aim to address this li
Externí odkaz:
http://arxiv.org/abs/2410.13859
The rapid progress of Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content. Existing methods often struggle to generalize to unseen domains due to the diverse nature of fac
Externí odkaz:
http://arxiv.org/abs/2410.04372
Image Quality Assessment (IQA) remains an unresolved challenge in the field of computer vision, due to complex distortion conditions, diverse image content, and limited data availability. The existing Blind IQA (BIQA) methods heavily rely on extensiv
Externí odkaz:
http://arxiv.org/abs/2409.05381
Extracting robust feature representation is critical for object re-identification to accurately identify objects across non-overlapping cameras. Although having a strong representation ability, the Vision Transformer (ViT) tends to overfit on most di
Externí odkaz:
http://arxiv.org/abs/2408.16684
Autor:
Ma, Yiwei, Ji, Jiayi, Ye, Ke, Lin, Weihuang, Wang, Zhibin, Zheng, Yonghan, Zhou, Qiang, Sun, Xiaoshuai, Ji, Rongrong
Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark
Externí odkaz:
http://arxiv.org/abs/2408.14180
Autor:
Wu, Mingrui, Huang, Oucheng, Ji, Jiayi, Li, Jiale, Cai, Xinyue, Kuang, Huafeng, Liu, Jianzhuang, Sun, Xiaoshuai, Ji, Rongrong
In this work, we propose a training-free, trajectory-based controllable T2I approach, termed TraDiffusion. This novel method allows users to effortlessly guide image generation via mouse trajectories. To achieve precise control, we design a distance
Externí odkaz:
http://arxiv.org/abs/2408.09739