Zobrazeno 1 - 10
of 2 012
pro vyhledávání: '"Zhang, Xiaohan"'
Aerial imagery analysis is critical for many research fields. However, obtaining frequent high-quality aerial images is not always accessible due to its high effort and cost requirements. One solution is to use the Ground-to-Aerial (G2A) technique to
Externí odkaz:
http://arxiv.org/abs/2408.04224
Autor:
Zhang, Xiaohan, Altaweel, Zainab, Hayamizu, Yohei, Ding, Yan, Amiri, Saeid, Yang, Hao, Kaminski, Andy, Esselink, Chad, Zhang, Shiqi
Vision-language models (VLMs) have been applied to robot task planning problems, where the robot receives a task in natural language and generates plans based on visual inputs. While current VLMs have demonstrated strong vision-language understanding
Externí odkaz:
http://arxiv.org/abs/2406.17659
Autor:
Ma, Zeyao, Zhang, Bohan, Zhang, Jing, Yu, Jifan, Zhang, Xiaokang, Zhang, Xiaohan, Luo, Sijia, Wang, Xi, Tang, Jie
We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing bench
Externí odkaz:
http://arxiv.org/abs/2406.14991
Autor:
GLM, Team, Zeng, Aohan, Xu, Bin, Wang, Bowen, Zhang, Chenhui, Yin, Da, Zhang, Dan, Rojas, Diego, Feng, Guanyu, Zhao, Hanlin, Lai, Hanyu, Yu, Hao, Wang, Hongning, Sun, Jiadai, Zhang, Jiajie, Cheng, Jiale, Gui, Jiayi, Tang, Jie, Zhang, Jing, Sun, Jingyu, Li, Juanzi, Zhao, Lei, Wu, Lindong, Zhong, Lucen, Liu, Mingdao, Huang, Minlie, Zhang, Peng, Zheng, Qinkai, Lu, Rui, Duan, Shuaiqi, Zhang, Shudan, Cao, Shulin, Yang, Shuxun, Tam, Weng Lam, Zhao, Wenyi, Liu, Xiao, Xia, Xiao, Zhang, Xiaohan, Gu, Xiaotao, Lv, Xin, Liu, Xinghan, Liu, Xinyi, Yang, Xinyue, Song, Xixuan, Zhang, Xunkai, An, Yifan, Xu, Yifan, Niu, Yilin, Yang, Yuantao, Li, Yueyan, Bai, Yushi, Dong, Yuxiao, Qi, Zehan, Wang, Zhaoyu, Yang, Zhen, Du, Zhengxiao, Hou, Zhenyu, Wang, Zihan
We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable model
Externí odkaz:
http://arxiv.org/abs/2406.12793
Autor:
Wu, Yuhang, Yu, Wenmeng, Cheng, Yean, Wang, Yan, Zhang, Xiaohan, Xu, Jiazheng, Ding, Ming, Dong, Yuxiao
Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-
Externí odkaz:
http://arxiv.org/abs/2406.09295
Autor:
Wang, Weihan, He, Zehai, Hong, Wenyi, Cheng, Yean, Zhang, Xiaohan, Qi, Ji, Huang, Shiyu, Xu, Bin, Dong, Yuxiao, Ding, Ming, Tang, Jie
Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the
Externí odkaz:
http://arxiv.org/abs/2406.08035
Autor:
Zhang, Xue, Cao, Si-Yuan, Wang, Fang, Zhang, Runmin, Wu, Zhe, Zhang, Xiaohan, Bai, Xiaokai, Shen, Hui-Liang
Most recent multispectral object detectors employ a two-branch structure to extract features from RGB and thermal images. While the two-branch structure achieves better performance than a single-branch structure, it overlooks inference efficiency. Th
Externí odkaz:
http://arxiv.org/abs/2405.16038
Recent progress in large-scale scene rendering has yielded Neural Radiance Fields (NeRF)-based models with an impressive ability to synthesize scenes across small objects and indoor scenes. Nevertheless, extending this idea to large-scale aerial rend
Externí odkaz:
http://arxiv.org/abs/2405.06214
Autor:
Zhang, Shudan, Zhao, Hanlin, Liu, Xiao, Zheng, Qinkai, Qi, Zehan, Gu, Xiaotao, Zhang, Xiaohan, Dong, Yuxiao, Tang, Jie
Large language models (LLMs) have manifested strong ability to generate codes for productive activities. However, current benchmarks for code synthesis, such as HumanEval, MBPP, and DS-1000, are predominantly oriented towards introductory tasks on al
Externí odkaz:
http://arxiv.org/abs/2405.04520
Existing neural radiance fields (NeRF)-based novel view synthesis methods for large-scale outdoor scenes are mainly built on a single altitude. Moreover, they often require a priori camera shooting height and scene scope, leading to inefficient and i
Externí odkaz:
http://arxiv.org/abs/2404.11897