Výsledky vyhledávání - "Shen, Haozhan"

Report

GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent

Autor: Zhao, Kangjia, Song, Jiahui, Sha, Leigang, Shen, Haozhan, Chen, Zhi, Zhao, Tiancheng, Liang, Xiubo, Yin, Jianwei

Nowadays, research on GUI agents is a hot topic in the AI community. However, current research focuses on GUI task automation, limiting the scope of applications in various GUI scenarios. In this paper, we propose a formalized and comprehensive envir

Externí odkaz: http://arxiv.org/abs/2412.18426

Zobrazit plný text záznamu

Report

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

Autor: Shen, Haozhan, Zhao, Kangjia, Zhao, Tiancheng, Xu, Ruochen, Zhang, Zilun, Zhu, Mingwei, Yin, Jianwei

An image, especially with high-resolution, typically consists of numerous visual elements, ranging from dominant large objects to fine-grained detailed objects. When perceiving such images, multimodal large language models~(MLLMs) face limitations du

Externí odkaz: http://arxiv.org/abs/2411.16044

Zobrazit plný text záznamu

Report

Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG

Autor: Zhang, Zilun, Shen, Haozhan, Zhao, Tiancheng, Wang, Yuhao, Chen, Bin, Cai, Yuxiang, Shang, Yongheng, Yin, Jianwei

Ultra High Resolution (UHR) remote sensing imagery (RSI) (e.g. 100,000 $\times$ 100,000 pixels or more) poses a significant challenge for current Remote Sensing Multimodal Large Language Models (RSMLLMs). If choose to resize the UHR image to standard

Externí odkaz: http://arxiv.org/abs/2411.07688

Zobrazit plný text záznamu

Report

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Autor: Shen, Haozhan, Zhao, Tiancheng, Zhu, Mingwei, Yin, Jianwei

Visual grounding, a crucial vision-language task involving the understanding of the visual context based on the query expression, necessitates the model to capture the interactions between objects, as well as various spatial and attribute information

Externí odkaz: http://arxiv.org/abs/2312.15043

Zobrazit plný text záznamu

Report

VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations

Autor: Zhao, Tiancheng, Zhang, Tianqi, Zhu, Mingwei, Shen, Haozhan, Lee, Kyusong, Lu, Xiaopeng, Yin, Jianwei

Vision-Language Pretraining (VLP) models have recently successfully facilitated many cross-modal downstream tasks. Most existing works evaluated their systems by comparing the fine-tuned downstream task performance. However, only average downstream t

Externí odkaz: http://arxiv.org/abs/2207.00221

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání