Výsledky vyhledávání - "Zha, Zheng Jun"

Report

Visual-Geometric Collaborative Guidance for Affordance Learning

Autor: Luo, Hongchen, Zhai, Wei, Wang, Jiao, Cao, Yang, Zha, Zheng-Jun

Perceiving potential ``action possibilities'' (\ie, affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions. Prevailing afforda

Externí odkaz: http://arxiv.org/abs/2410.11363

Zobrazit plný text záznamu

Report

ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization

Autor: Li, Jiawei, Zhang, Fanrui, Zhu, Jiaying, Sun, Esther, Zhang, Qiang, Zha, Zheng-Jun

Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forg

Externí odkaz: http://arxiv.org/abs/2410.10238

Zobrazit plný text záznamu

Report

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Autor: Yang, Jian, Yin, Dacheng, Zhou, Yizhou, Rao, Fengyun, Zhai, Wei, Cao, Yang, Zha, Zheng-Jun

Recent advancements in multi-modal large language models have propelled the development of joint probabilistic models capable of both image understanding and generation. However, we have identified that recent methods inevitably suffer from loss of i

Externí odkaz: http://arxiv.org/abs/2410.10798

Zobrazit plný text záznamu

Report

LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

Autor: Wu, Wei, Zheng, Kecheng, Ma, Shuailei, Lu, Fan, Guo, Yuxin, Zhang, Yifei, Chen, Wei, Guo, Qingpei, Shen, Yujun, Zha, Zheng-Jun

Understanding long text is of great demands in practice but beyond the reach of most language-image pre-training (LIP) models. In this work, we empirically confirm that the key reason causing such an issue is that the training images are usually pair

Externí odkaz: http://arxiv.org/abs/2410.05249

Zobrazit plný text záznamu

Report

Grounding 3D Scene Affordance From Egocentric Interactions

Autor: Liu, Cuiyu, Zhai, Wei, Yang, Yuhang, Luo, Hongchen, Liang, Sen, Cao, Yang, Zha, Zheng-Jun

Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances ba

Externí odkaz: http://arxiv.org/abs/2409.19650

Zobrazit plný text záznamu

Report

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

Autor: Huang, Yukun, Wang, Jianan, Zeng, Ailing, Zha, Zheng-Jun, Zhang, Lei, Liu, Xihui

Leveraging pretrained 2D diffusion models and score distillation sampling (SDS), recent methods have shown promising results for text-to-3D avatar generation. However, generating high-quality 3D avatars capable of expressive animation remains challen

Externí odkaz: http://arxiv.org/abs/2409.17145

Zobrazit plný text záznamu

Report

QMambaBSR: Burst Image Super-Resolution with Query State Space Model

Autor: Di, Xin, Peng, Long, Xia, Peizhe, Li, Wenbo, Pei, Renjing, Cao, Yang, Wang, Yang, Zha, Zheng-Jun

Burst super-resolution aims to reconstruct high-resolution images with higher quality and richer details by fusing the sub-pixel information from multiple burst low-resolution frames. In BusrtSR, the key challenge lies in extracting the base frame's

Externí odkaz: http://arxiv.org/abs/2408.08665

Zobrazit plný text záznamu

Report

Downstream-Pretext Domain Knowledge Traceback for Active Learning

Autor: Zhang, Beichen, Li, Liang, Zha, Zheng-Jun, Luo, Jiebo, Huang, Qingming

Active learning (AL) is designed to construct a high-quality labeled dataset by iteratively selecting the most informative samples. Such sampling heavily relies on data representation, while recently pre-training is popular for robust feature learnin

Externí odkaz: http://arxiv.org/abs/2407.14720

Zobrazit plný text záznamu

Report

FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing

Autor: Du, Zhibo, Peng, Long, Wang, Yang, Cao, Yang, Zha, Zheng-Jun

Moir\'e patterns are commonly seen when taking photos of screens. Camera devices usually have limited hardware performance but take high-resolution photos. However, users are sensitive to the photo processing time, which presents a hardly considered

Externí odkaz: http://arxiv.org/abs/2406.14912

Zobrazit plný text záznamu

Report

DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

Autor: Xu, Senyan, Sun, Zhijing, Zhu, Jiaying, Zhu, Yurui, Fu, Xueyang, Zha, Zheng-Jun

Hybrid Event-Based Vision Sensor (HybridEVS) is a novel sensor integrating traditional frame-based and event-based sensors, offering substantial benefits for applications requiring low-light, high dynamic range, and low-latency environments, such as

Externí odkaz: http://arxiv.org/abs/2406.07951

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání