Výsledky vyhledávání - "Li, Chunyuan"

Report

Long Context Transfer from Language to Vision

Autor: Zhang, Peiyuan, Zhang, Kaichen, Li, Bo, Zeng, Guangtao, Yang, Jingkang, Zhang, Yuanhan, Wang, Ziyue, Tan, Haoran, Li, Chunyuan, Liu, Ziwei

Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively

Externí odkaz: http://arxiv.org/abs/2406.16852

Zobrazit plný text záznamu

Report

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

Autor: Xu, Lu, Zhu, Sijie, Li, Chunyuan, Kuo, Chia-Wen, Chen, Fan, Wang, Xinyao, Chen, Guang, Du, Dawei, Yuan, Ye, Wen, Longyin

The emerging video LMMs (Large Multimodal Models) have achieved significant improvements on generic video understanding in the form of VQA (Visual Question Answering), where the raw videos are captured by cameras. However, a large portion of videos i

Externí odkaz: http://arxiv.org/abs/2406.10484

Zobrazit plný text záznamu

Report

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of

Externí odkaz: http://arxiv.org/abs/2406.09411

Zobrazit plný text záznamu

Report

Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

Autor: Xiao, Xin, Wu, Bohong, Wang, Jiacong, Li, Chunyuan, Zhou, Xun, Guo, Haoyuan

Existing image-text modality alignment in Vision Language Models (VLMs) treats each text token equally in an autoregressive manner. Despite being simple and effective, this method results in sub-optimal cross-modal alignment by over-emphasizing the t

Externí odkaz: http://arxiv.org/abs/2405.17871

Zobrazit plný text záznamu

Report

Graphic Design with Large Multimodal Model

Autor: Cheng, Yutao, Zhang, Zhao, Yang, Maoke, Nie, Hui, Li, Chunyuan, Wu, Xinglong, Shao, Jie

In the field of graphic design, automating the integration of design elements into a cohesive multi-layered artwork not only boosts productivity but also paves the way for the democratization of graphic design. One existing practice is Graphic Layout

Externí odkaz: http://arxiv.org/abs/2404.14368

Zobrazit plný text záznamu

Report

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Autor: Zhang, Ruohong, Gui, Liangke, Sun, Zhiqing, Feng, Yihao, Xu, Keyang, Zhang, Yuanhan, Fu, Di, Li, Chunyuan, Hauptmann, Alexander, Bisk, Yonatan, Yang, Yiming

Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM). However, in tasks involving video instruction-following, providing informative

Externí odkaz: http://arxiv.org/abs/2404.01258

Zobrazit plný text záznamu

Report

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges

Externí odkaz: http://arxiv.org/abs/2403.08002

Zobrazit plný text záznamu

Report

TrustLLM: Trustworthiness in Large Language Models

Autor: Sun, Lichao, Huang, Yue, Wang, Haoran, Wu, Siyuan, Zhang, Qihui, Li, Yuan, Gao, Chujie, Huang, Yixin, Lyu, Wenhan, Zhang, Yixuan, Li, Xiner, Liu, Zhengliang, Liu, Yixin, Wang, Yijue, Zhang, Zhikun, Vidgen, Bertie, Kailkhura, Bhavya, Xiong, Caiming, Xiao, Chaowei, Li, Chunyuan, Xing, Eric, Huang, Furong, Liu, Hao, Ji, Heng, Wang, Hongyi, Zhang, Huan, Yao, Huaxiu, Kellis, Manolis, Zitnik, Marinka, Jiang, Meng, Bansal, Mohit, Zou, James, Pei, Jian, Liu, Jian, Gao, Jianfeng, Han, Jiawei, Zhao, Jieyu, Tang, Jiliang, Wang, Jindong, Vanschoren, Joaquin, Mitchell, John, Shu, Kai, Xu, Kaidi, Chang, Kai-Wei, He, Lifang, Huang, Lifu, Backes, Michael, Gong, Neil Zhenqiang, Yu, Philip S., Chen, Pin-Yu, Gu, Quanquan, Xu, Ran, Ying, Rex, Ji, Shuiwang, Jana, Suman, Chen, Tianlong, Liu, Tianming, Zhou, Tianyi, Wang, William, Li, Xiang, Zhang, Xiangliang, Wang, Xiao, Xie, Xing, Chen, Xun, Wang, Xuyu, Liu, Yan, Ye, Yanfang, Cao, Yinzhi, Chen, Yong, Zhao, Yue

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Ther

Externí odkaz: http://arxiv.org/abs/2401.05561

Zobrazit plný text záznamu

Report

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Autor: Zhang, Hao, Li, Hongyang, Li, Feng, Ren, Tianhe, Zou, Xueyan, Liu, Shilong, Huang, Shijia, Gao, Jianfeng, Zhang, Lei, Li, Chunyuan, Yang, Jianwei

With the recent significant advancements in large multi-modal models (LMMs), the importance of their grounding capability in visual chat is increasingly recognized. Despite recent efforts to enable LMMs to support grounding, their capabilities for gr

Externí odkaz: http://arxiv.org/abs/2312.02949

Zobrazit plný text záznamu

Report

Visual In-Context Prompting

Autor: Li, Feng, Jiang, Qing, Zhang, Hao, Ren, Tianhe, Liu, Shilong, Zou, Xueyan, Xu, Huaizhe, Li, Hongyang, Li, Chunyuan, Yang, Jianwei, Zhang, Lei, Gao, Jianfeng

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain. Existing visual prompting methods focus on referring segmentation to segment

Externí odkaz: http://arxiv.org/abs/2311.13601

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání