Výsledky vyhledávání - "Chang, Baobao"

Report

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Autor: Zhao, Haozhe, Ma, Xiaojian, Chen, Liang, Si, Shuzheng, Wu, Rujie, An, Kaikai, Yu, Peiyu, Zhang, Minjia, Li, Qing, Chang, Baobao

This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2

Externí odkaz: http://arxiv.org/abs/2407.05282

Zobrazit plný text záznamu

Report

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Autor: Huang, Jinsheng, Chen, Liang, Guo, Taian, Zeng, Fu, Zhao, Yusheng, Wu, Bohan, Yuan, Ye, Zhao, Haozhe, Guo, Zhihui, Zhang, Yichi, Yuan, Jingyang, Ju, Wei, Liu, Luchen, Liu, Tianyu, Chang, Baobao, Zhang, Ming

Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for suc

Externí odkaz: http://arxiv.org/abs/2407.00468

Zobrazit plný text záznamu

Report

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Autor: Gao, Bofei, Cai, Zefan, Xu, Runxin, Wang, Peiyi, Zheng, Ce, Lin, Runji, Lu, Keming, Liu, Dayiheng, Zhou, Chang, Xiao, Wen, Hu, Junjie, Liu, Tianyu, Chang, Baobao

Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately

Externí odkaz: http://arxiv.org/abs/2406.14024

Zobrazit plný text záznamu

Report

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

Autor: Ping, Bowen, Wang, Shuo, Wang, Hanqing, Han, Xu, Xu, Yuzhuang, Yan, Yukun, Chen, Yun, Chang, Baobao, Liu, Zhiyuan, Sun, Maosong

Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decompos

Externí odkaz: http://arxiv.org/abs/2406.08903

Zobrazit plný text záznamu

Report

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Autor: Cai., Zefan, Zhang, Yichi, Gao, Bofei, Liu, Yuliang, Liu, Tianyu, Lu, Keming, Xiong, Wayne, Dong, Yue, Chang, Baobao, Hu, Junjie, Xiao, Wen

In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramida

Externí odkaz: http://arxiv.org/abs/2406.02069

Zobrazit plný text záznamu

Report

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

Autor: Zhao, Haozhe, Cai, Zefan, Si, Shuzheng, Chen, Liang, He, Yufeng, An, Kaikai, Chang, Baobao

Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow the

Externí odkaz: http://arxiv.org/abs/2404.08491

Zobrazit plný text záznamu

Report

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Autor: Chen, Liang, Zhao, Haozhe, Liu, Tianyu, Bai, Shuai, Lin, Junyang, Zhou, Chang, Chang, Baobao

In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of e

Externí odkaz: http://arxiv.org/abs/2403.06764

Zobrazit plný text záznamu

Report

Improving Event Definition Following For Zero-Shot Event Detection

Autor: Cai, Zefan, Kung, Po-Nien, Suvarna, Ashima, Ma, Mingyu Derek, Bansal, Hritik, Chang, Baobao, Brantingham, P. Jeffrey, Wang, Wei, Peng, Nanyun

Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types, and prompt them with unseen event definitions. These approaches yield sporadic successes, yet generally fall short of expectations. In

Externí odkaz: http://arxiv.org/abs/2403.02586

Zobrazit plný text záznamu

Report

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Autor: Chen, Liang, Zhang, Yichi, Ren, Shuhuai, Zhao, Haozhe, Cai, Zefan, Wang, Yuchi, Wang, Peiyi, Meng, Xiangdi, Liu, Tianyu, Chang, Baobao

We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-

Externí odkaz: http://arxiv.org/abs/2402.15527

Zobrazit plný text záznamu

Report

VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

Autor: Zhang, Rongyu, Cai, Zefan, Yang, Huanrui, Liu, Zidong, Gudovskiy, Denis, Okuno, Tomoyuki, Nakata, Yohei, Keutzer, Kurt, Chang, Baobao, Du, Yuan, Du, Li, Zhang, Shanghang

Finetuning a pretrained vision model (PVM) is a common technique for learning downstream vision tasks. However, the conventional finetuning process with randomly sampled data points results in diminished training efficiency. To address this drawback,

Externí odkaz: http://arxiv.org/abs/2401.07853

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání