Zobrazeno 1 - 10
of 41
pro vyhledávání: '"Cai Zefan"'
Autor:
Gao, Bofei, Song, Feifan, Yang, Zhe, Cai, Zefan, Miao, Yibo, Dong, Qingxiu, Li, Lei, Ma, Chenghao, Chen, Liang, Xu, Runxin, Tang, Zhengyang, Wang, Benyou, Zan, Daoguang, Quan, Shanghaoran, Zhang, Ge, Sha, Lei, Zhang, Yichang, Ren, Xuancheng, Liu, Tianyu, Chang, Baobao
Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8%
Externí odkaz:
http://arxiv.org/abs/2410.07985
The rapid advances of multi-modal agents built on large foundation models have largely overlooked their potential for language-based communication between agents in collaborative tasks. This oversight presents a critical gap in understanding their ef
Externí odkaz:
http://arxiv.org/abs/2410.07553
Autor:
Chen, Liang, Tan, Sinan, Cai, Zefan, Xie, Weichu, Zhao, Haozhe, Zhang, Yichi, Lin, Junyang, Bai, Jinze, Liu, Tianyu, Chang, Baobao
This work tackles the information loss bottleneck of vector-quantization (VQ) autoregressive image generation by introducing a novel model architecture called the 2-Dimensional Autoregression (DnD) Transformer. The DnD-Transformer predicts more codes
Externí odkaz:
http://arxiv.org/abs/2410.01912
Autor:
Gao, Bofei, Song, Feifan, Miao, Yibo, Cai, Zefan, Yang, Zhe, Chen, Liang, Hu, Helan, Xu, Runxin, Dong, Qingxiu, Zheng, Ce, Xiao, Wen, Zhang, Ge, Zan, Daoguang, Lu, Keming, Yu, Bowen, Liu, Dayiheng, Cui, Zeyu, Yang, Jian, Sha, Lei, Wang, Houfeng, Sui, Zhifang, Wang, Peiyi, Liu, Tianyu, Chang, Baobao
Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently
Externí odkaz:
http://arxiv.org/abs/2409.02795
Autor:
Gao, Bofei, Cai, Zefan, Xu, Runxin, Wang, Peiyi, Zheng, Ce, Lin, Runji, Lu, Keming, Liu, Dayiheng, Zhou, Chang, Xiao, Wen, Hu, Junjie, Liu, Tianyu, Chang, Baobao
In recent progress, mathematical verifiers have achieved success in mathematical reasoning tasks by validating the correctness of solutions generated by policy models. However, existing verifiers are trained with binary classification labels, which a
Externí odkaz:
http://arxiv.org/abs/2406.14024
Autor:
Cai, Zefan, Zhang, Yichi, Gao, Bofei, Liu, Yuliang, Liu, Tianyu, Lu, Keming, Xiong, Wayne, Dong, Yue, Chang, Baobao, Hu, Junjie, Xiao, Wen
In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramida
Externí odkaz:
http://arxiv.org/abs/2406.02069
Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow the
Externí odkaz:
http://arxiv.org/abs/2404.08491
Autor:
Cai, Zefan, Kung, Po-Nien, Suvarna, Ashima, Ma, Mingyu Derek, Bansal, Hritik, Chang, Baobao, Brantingham, P. Jeffrey, Wang, Wei, Peng, Nanyun
Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types, and prompt them with unseen event definitions. These approaches yield sporadic successes, yet generally fall short of expectations. In
Externí odkaz:
http://arxiv.org/abs/2403.02586
Autor:
Chen, Liang, Zhang, Yichi, Ren, Shuhuai, Zhao, Haozhe, Cai, Zefan, Wang, Yuchi, Wang, Peiyi, Meng, Xiangdi, Liu, Tianyu, Chang, Baobao
We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-
Externí odkaz:
http://arxiv.org/abs/2402.15527
Autor:
Zhang, Rongyu, Cai, Zefan, Yang, Huanrui, Liu, Zidong, Gudovskiy, Denis, Okuno, Tomoyuki, Nakata, Yohei, Keutzer, Kurt, Chang, Baobao, Du, Yuan, Du, Li, Zhang, Shanghang
Finetuning a pretrained vision model (PVM) is a common technique for learning downstream vision tasks. However, the conventional finetuning process with randomly sampled data points results in diminished training efficiency. To address this drawback,
Externí odkaz:
http://arxiv.org/abs/2401.07853