Výsledky vyhledávání - "Zhang, Xiaohan"

Report

Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance

Autor: Arrabi, Ahmad, Zhang, Xiaohan, Sultan, Waqas, Chen, Chen, Wshah, Safwan

Aerial imagery analysis is critical for many research fields. However, obtaining frequent high-quality aerial images is not always accessible due to its high effort and cost requirements. One solution is to use the Ground-to-Aerial (G2A) technique to

Externí odkaz: http://arxiv.org/abs/2408.04224

Zobrazit plný text záznamu

Report

DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

Autor: Zhang, Xiaohan, Altaweel, Zainab, Hayamizu, Yohei, Ding, Yan, Amiri, Saeid, Yang, Hao, Kaminski, Andy, Esselink, Chad, Zhang, Shiqi

Vision-language models (VLMs) have been applied to robot task planning problems, where the robot receives a task in natural language and generates plans based on visual inputs. While current VLMs have demonstrated strong vision-language understanding

Externí odkaz: http://arxiv.org/abs/2406.17659

Zobrazit plný text záznamu

Report

SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

Autor: Ma, Zeyao, Zhang, Bohan, Zhang, Jing, Yu, Jifan, Zhang, Xiaokang, Zhang, Xiaohan, Luo, Sijia, Wang, Xi, Tang, Jie

We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing bench

Externí odkaz: http://arxiv.org/abs/2406.14991

Zobrazit plný text záznamu

Report

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable model

Externí odkaz: http://arxiv.org/abs/2406.12793

Zobrazit plný text záznamu

Report

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

Autor: Wu, Yuhang, Yu, Wenmeng, Cheng, Yean, Wang, Yan, Zhang, Xiaohan, Xu, Jiazheng, Ding, Ming, Dong, Yuxiao

Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-

Externí odkaz: http://arxiv.org/abs/2406.09295

Zobrazit plný text záznamu

Report

LVBench: An Extreme Long Video Understanding Benchmark

Autor: Wang, Weihan, He, Zehai, Hong, Wenyi, Cheng, Yean, Zhang, Xiaohan, Qi, Ji, Huang, Shiyu, Xu, Bin, Dong, Yuxiao, Ding, Ming, Tang, Jie

Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the

Externí odkaz: http://arxiv.org/abs/2406.08035

Zobrazit plný text záznamu

Report

Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection

Autor: Zhang, Xue, Cao, Si-Yuan, Wang, Fang, Zhang, Runmin, Wu, Zhe, Zhang, Xiaohan, Bai, Xiaokai, Shen, Hui-Liang

Most recent multispectral object detectors employ a two-branch structure to extract features from RGB and thermal images. While the two-branch structure achieves better performance than a single-branch structure, it overlooks inference efficiency. Th

Externí odkaz: http://arxiv.org/abs/2405.16038

Zobrazit plný text záznamu

Report

Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering

Autor: Zhang, Xiaohan, Qiu, Yukui, Sun, Zhenyu, Liu, Qi

Recent progress in large-scale scene rendering has yielded Neural Radiance Fields (NeRF)-based models with an impressive ability to synthesize scenes across small objects and indoor scenes. Nevertheless, extending this idea to large-scale aerial rend

Externí odkaz: http://arxiv.org/abs/2405.06214

Zobrazit plný text záznamu

Report

NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

Autor: Zhang, Shudan, Zhao, Hanlin, Liu, Xiao, Zheng, Qinkai, Qi, Zehan, Gu, Xiaotao, Zhang, Xiaohan, Dong, Yuxiao, Tang, Jie

Large language models (LLMs) have manifested strong ability to generate codes for productive activities. However, current benchmarks for code synthesis, such as HumanEval, MBPP, and DS-1000, are predominantly oriented towards introductory tasks on al

Externí odkaz: http://arxiv.org/abs/2405.04520

Zobrazit plný text záznamu

Report

AG-NeRF: Attention-guided Neural Radiance Fields for Multi-height Large-scale Outdoor Scene Rendering

Autor: Guo, Jingfeng, Zhang, Xiaohan, Zhao, Baozhu, Liu, Qi

Existing neural radiance fields (NeRF)-based novel view synthesis methods for large-scale outdoor scenes are mainly built on a single altitude. Moreover, they often require a priori camera shooting height and scene scope, leading to inefficient and i

Externí odkaz: http://arxiv.org/abs/2404.11897

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání