Výsledky vyhledávání

Report

COMMA: A Communicative Multimodal Multi-Agent Benchmark

Autor: Ossowski, Timothy, Chen, Jixuan, Maqbool, Danyal, Cai, Zefan, Bradshaw, Tyler, Hu, Junjie

The rapid advances of multi-modal agents built on large foundation models have largely overlooked their potential for language-based communication between agents in collaborative tasks. This oversight presents a critical gap in understanding their ef

Externí odkaz: http://arxiv.org/abs/2410.07553

Zobrazit plný text záznamu

Report

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

Autor: Chen, Liang, Tan, Sinan, Cai, Zefan, Xie, Weichu, Zhao, Haozhe, Zhang, Yichi, Lin, Junyang, Bai, Jinze, Liu, Tianyu, Chang, Baobao

This work tackles the information loss bottleneck of vector-quantization (VQ) autoregressive image generation by introducing a novel model architecture called the 2-Dimensional Autoregression (DnD) Transformer. The DnD-Transformer predicts more codes

Externí odkaz: http://arxiv.org/abs/2410.01912

Zobrazit plný text záznamu

Report

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently

Externí odkaz: http://arxiv.org/abs/2409.02795

Zobrazit plný text záznamu

Report

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Autor: Gao, Bofei, Cai, Zefan, Xu, Runxin, Wang, Peiyi, Zheng, Ce, Lin, Runji, Lu, Keming, Liu, Dayiheng, Zhou, Chang, Xiao, Wen, Hu, Junjie, Liu, Tianyu, Chang, Baobao

In recent progress, mathematical verifiers have achieved success in mathematical reasoning tasks by validating the correctness of solutions generated by policy models. However, existing verifiers are trained with binary classification labels, which a

Externí odkaz: http://arxiv.org/abs/2406.14024

Zobrazit plný text záznamu

Report

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Autor: Cai, Zefan, Zhang, Yichi, Gao, Bofei, Liu, Yuliang, Liu, Tianyu, Lu, Keming, Xiong, Wayne, Dong, Yue, Chang, Baobao, Hu, Junjie, Xiao, Wen

In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramida

Externí odkaz: http://arxiv.org/abs/2406.02069

Zobrazit plný text záznamu

Report

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

Autor: Zhao, Haozhe, Cai, Zefan, Si, Shuzheng, Chen, Liang, He, Yufeng, An, Kaikai, Chang, Baobao

Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow the

Externí odkaz: http://arxiv.org/abs/2404.08491

Zobrazit plný text záznamu

Report

Improving Event Definition Following For Zero-Shot Event Detection

Autor: Cai, Zefan, Kung, Po-Nien, Suvarna, Ashima, Ma, Mingyu Derek, Bansal, Hritik, Chang, Baobao, Brantingham, P. Jeffrey, Wang, Wei, Peng, Nanyun

Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types, and prompt them with unseen event definitions. These approaches yield sporadic successes, yet generally fall short of expectations. In

Externí odkaz: http://arxiv.org/abs/2403.02586

Zobrazit plný text záznamu

Report

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Autor: Chen, Liang, Zhang, Yichi, Ren, Shuhuai, Zhao, Haozhe, Cai, Zefan, Wang, Yuchi, Wang, Peiyi, Meng, Xiangdi, Liu, Tianyu, Chang, Baobao

We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-

Externí odkaz: http://arxiv.org/abs/2402.15527

Zobrazit plný text záznamu

Report

VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

Autor: Zhang, Rongyu, Cai, Zefan, Yang, Huanrui, Liu, Zidong, Gudovskiy, Denis, Okuno, Tomoyuki, Nakata, Yohei, Keutzer, Kurt, Chang, Baobao, Du, Yuan, Du, Li, Zhang, Shanghang

Finetuning a pretrained vision model (PVM) is a common technique for learning downstream vision tasks. However, the conventional finetuning process with randomly sampled data points results in diminished training efficiency. To address this drawback,

Externí odkaz: http://arxiv.org/abs/2401.07853

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání