Zobrazeno 1 - 10
of 354
pro vyhledávání: '"Meng, Fanqing"'
Autor:
Meng, Fanqing, Wang, Jin, Li, Chuanhao, Lu, Quanfeng, Tian, Hao, Liao, Jiaqi, Zhu, Xizhou, Dai, Jifeng, Qiao, Yu, Luo, Ping, Zhang, Kaipeng, Shao, Wenqi
The capability to process multiple images is crucial for Large Vision-Language Models (LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi-image LVLMs have begun to address this need. However, their evaluation has not
Externí odkaz:
http://arxiv.org/abs/2408.02718
Autor:
Meng, Fanqing, Shao, Wenqi, Luo, Lixin, Wang, Yahong, Chen, Yiran, Lu, Quanfeng, Yang, Yue, Yang, Tianshuo, Zhang, Kaipeng, Qiao, Yu, Luo, Ping
Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and eve
Externí odkaz:
http://arxiv.org/abs/2406.11802
Autor:
Lu, Quanfeng, Shao, Wenqi, Liu, Zitao, Meng, Fanqing, Li, Boxuan, Chen, Botong, Huang, Siyuan, Zhang, Kaipeng, Qiao, Yu, Luo, Ping
Smartphone users often navigate across multiple applications (apps) to complete tasks such as sharing content between social media platforms. Autonomous Graphical User Interface (GUI) navigation agents can enhance user experience in communication, en
Externí odkaz:
http://arxiv.org/abs/2406.08451
Autor:
Ying, Kaining, Meng, Fanqing, Wang, Jin, Li, Zhiqian, Lin, Han, Yang, Yue, Zhang, Hao, Zhang, Wenbo, Lin, Yuqi, Liu, Shuo, Lei, Jiayi, Lu, Quanfeng, Chen, Runjian, Xu, Peng, Zhang, Renrui, Zhang, Haozhe, Gao, Peng, Wang, Yali, Qiao, Yu, Luo, Ping, Zhang, Kaipeng, Shao, Wenqi
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks te
Externí odkaz:
http://arxiv.org/abs/2404.16006
Charts play a vital role in data visualization, understanding data patterns, and informed decision-making. However, their unique combination of graphical elements (e.g., bars, lines) and textual components (e.g., labels, legends) poses challenges for
Externí odkaz:
http://arxiv.org/abs/2401.02384
Autor:
Meng, Fanqing, Shao, Wenqi, Peng, Zhanglin, Jiang, Chonghe, Zhang, Kaipeng, Qiao, Yu, Luo, Ping
This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring, captioning, visu
Externí odkaz:
http://arxiv.org/abs/2308.06262
Autor:
Shao, Wenqi, Lei, Meng, Hu, Yutao, Gao, Peng, Zhang, Kaipeng, Meng, Fanqing, Xu, Peng, Huang, Siyuan, Li, Hongsheng, Qiao, Yu, Luo, Ping
Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated significant progress in tackling complex multimodal tasks. Among these cutting-edge developments, Google's Bard stands out for its remarkable multimodal capabilities, promo
Externí odkaz:
http://arxiv.org/abs/2308.03729
Autor:
Xu, Peng, Shao, Wenqi, Zhang, Kaipeng, Gao, Peng, Liu, Shuo, Lei, Meng, Meng, Fanqing, Huang, Siyuan, Qiao, Yu, Luo, Ping
Large Vision-Language Models (LVLMs) have recently played a dominant role in multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation of their efficacy. This paper presents a comprehensive evaluation of publicly
Externí odkaz:
http://arxiv.org/abs/2306.09265
Autor:
Chen, Xiaxia, Wang, Xiang, Wang, Jingxue, Xu, Hongwei, Liu, Chao, Wang, Yinglong, Sun, Shiqin, Cui, Peizhe, Meng, Fanqing
Publikováno v:
In Chemical Engineering Science 5 October 2024 298
Autor:
Meng, Fanqing, Liu, Chao, Guo, Juan, Wang, Jingxue, Zhao, Lifang, Xu, Hongwei, Chen, Xiaxia, Wang, Yinglong, Zhu, Zhaoyou, Zheng, Zhonghui, Cui, Peizhe
Publikováno v:
In Separation and Purification Technology 19 January 2025 353 Part A