Výsledky vyhledávání - "Meng, Fanqing"

Report

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Autor: Meng, Fanqing, Wang, Jin, Li, Chuanhao, Lu, Quanfeng, Tian, Hao, Liao, Jiaqi, Zhu, Xizhou, Dai, Jifeng, Qiao, Yu, Luo, Ping, Zhang, Kaipeng, Shao, Wenqi

The capability to process multiple images is crucial for Large Vision-Language Models (LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi-image LVLMs have begun to address this need. However, their evaluation has not

Externí odkaz: http://arxiv.org/abs/2408.02718

Zobrazit plný text záznamu

Report

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Autor: Meng, Fanqing, Shao, Wenqi, Luo, Lixin, Wang, Yahong, Chen, Yiran, Lu, Quanfeng, Yang, Yue, Yang, Tianshuo, Zhang, Kaipeng, Qiao, Yu, Luo, Ping

Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and eve

Externí odkaz: http://arxiv.org/abs/2406.11802

Zobrazit plný text záznamu

Report

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Autor: Lu, Quanfeng, Shao, Wenqi, Liu, Zitao, Meng, Fanqing, Li, Boxuan, Chen, Botong, Huang, Siyuan, Zhang, Kaipeng, Qiao, Yu, Luo, Ping

Smartphone users often navigate across multiple applications (apps) to complete tasks such as sharing content between social media platforms. Autonomous Graphical User Interface (GUI) navigation agents can enhance user experience in communication, en

Externí odkaz: http://arxiv.org/abs/2406.08451

Zobrazit plný text záznamu

Report

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks te

Externí odkaz: http://arxiv.org/abs/2404.16006

Zobrazit plný text záznamu

Report

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

Autor: Meng, Fanqing, Shao, Wenqi, Lu, Quanfeng, Gao, Peng, Zhang, Kaipeng, Qiao, Yu, Luo, Ping

Charts play a vital role in data visualization, understanding data patterns, and informed decision-making. However, their unique combination of graphical elements (e.g., bars, lines) and textual components (e.g., labels, legends) poses challenges for

Externí odkaz: http://arxiv.org/abs/2401.02384

Zobrazit plný text záznamu

Report

Foundation Model is Efficient Multimodal Multitask Model Selector

Autor: Meng, Fanqing, Shao, Wenqi, Peng, Zhanglin, Jiang, Chonghe, Zhang, Kaipeng, Qiao, Yu, Luo, Ping

This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring, captioning, visu

Externí odkaz: http://arxiv.org/abs/2308.06262

Zobrazit plný text záznamu

Report

TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models

Autor: Shao, Wenqi, Lei, Meng, Hu, Yutao, Gao, Peng, Zhang, Kaipeng, Meng, Fanqing, Xu, Peng, Huang, Siyuan, Li, Hongsheng, Qiao, Yu, Luo, Ping

Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated significant progress in tackling complex multimodal tasks. Among these cutting-edge developments, Google's Bard stands out for its remarkable multimodal capabilities, promo

Externí odkaz: http://arxiv.org/abs/2308.03729

Zobrazit plný text záznamu

Report

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

Autor: Xu, Peng, Shao, Wenqi, Zhang, Kaipeng, Gao, Peng, Liu, Shuo, Lei, Meng, Meng, Fanqing, Huang, Siyuan, Qiao, Yu, Luo, Ping

Large Vision-Language Models (LVLMs) have recently played a dominant role in multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation of their efficacy. This paper presents a comprehensive evaluation of publicly

Externí odkaz: http://arxiv.org/abs/2306.09265

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání