Zobrazeno 1 - 10
of 447
pro vyhledávání: '"Zha, Zheng Jun"'
Perceiving potential ``action possibilities'' (\ie, affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions. Prevailing afforda
Externí odkaz:
http://arxiv.org/abs/2410.11363
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization
Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forg
Externí odkaz:
http://arxiv.org/abs/2410.10238
Recent advancements in multi-modal large language models have propelled the development of joint probabilistic models capable of both image understanding and generation. However, we have identified that recent methods inevitably suffer from loss of i
Externí odkaz:
http://arxiv.org/abs/2410.10798
Autor:
Wu, Wei, Zheng, Kecheng, Ma, Shuailei, Lu, Fan, Guo, Yuxin, Zhang, Yifei, Chen, Wei, Guo, Qingpei, Shen, Yujun, Zha, Zheng-Jun
Understanding long text is of great demands in practice but beyond the reach of most language-image pre-training (LIP) models. In this work, we empirically confirm that the key reason causing such an issue is that the training images are usually pair
Externí odkaz:
http://arxiv.org/abs/2410.05249
Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances ba
Externí odkaz:
http://arxiv.org/abs/2409.19650
Leveraging pretrained 2D diffusion models and score distillation sampling (SDS), recent methods have shown promising results for text-to-3D avatar generation. However, generating high-quality 3D avatars capable of expressive animation remains challen
Externí odkaz:
http://arxiv.org/abs/2409.17145
Autor:
Di, Xin, Peng, Long, Xia, Peizhe, Li, Wenbo, Pei, Renjing, Cao, Yang, Wang, Yang, Zha, Zheng-Jun
Burst super-resolution aims to reconstruct high-resolution images with higher quality and richer details by fusing the sub-pixel information from multiple burst low-resolution frames. In BusrtSR, the key challenge lies in extracting the base frame's
Externí odkaz:
http://arxiv.org/abs/2408.08665
Active learning (AL) is designed to construct a high-quality labeled dataset by iteratively selecting the most informative samples. Such sampling heavily relies on data representation, while recently pre-training is popular for robust feature learnin
Externí odkaz:
http://arxiv.org/abs/2407.14720
Moir\'e patterns are commonly seen when taking photos of screens. Camera devices usually have limited hardware performance but take high-resolution photos. However, users are sensitive to the photo processing time, which presents a hardly considered
Externí odkaz:
http://arxiv.org/abs/2406.14912
Hybrid Event-Based Vision Sensor (HybridEVS) is a novel sensor integrating traditional frame-based and event-based sensors, offering substantial benefits for applications requiring low-light, high dynamic range, and low-latency environments, such as
Externí odkaz:
http://arxiv.org/abs/2406.07951