Zobrazeno 1 - 10
of 353
pro vyhledávání: '"Chang, Baobao"'
Autor:
Zhao, Haozhe, Ma, Xiaojian, Chen, Liang, Si, Shuzheng, Wu, Rujie, An, Kaikai, Yu, Peiyu, Zhang, Minjia, Li, Qing, Chang, Baobao
This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2
Externí odkaz:
http://arxiv.org/abs/2407.05282
Autor:
Huang, Jinsheng, Chen, Liang, Guo, Taian, Zeng, Fu, Zhao, Yusheng, Wu, Bohan, Yuan, Ye, Zhao, Haozhe, Guo, Zhihui, Zhang, Yichi, Yuan, Jingyang, Ju, Wei, Liu, Luchen, Liu, Tianyu, Chang, Baobao, Zhang, Ming
Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for suc
Externí odkaz:
http://arxiv.org/abs/2407.00468
Autor:
Gao, Bofei, Cai, Zefan, Xu, Runxin, Wang, Peiyi, Zheng, Ce, Lin, Runji, Lu, Keming, Liu, Dayiheng, Zhou, Chang, Xiao, Wen, Hu, Junjie, Liu, Tianyu, Chang, Baobao
Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately
Externí odkaz:
http://arxiv.org/abs/2406.14024
Autor:
Ping, Bowen, Wang, Shuo, Wang, Hanqing, Han, Xu, Xu, Yuzhuang, Yan, Yukun, Chen, Yun, Chang, Baobao, Liu, Zhiyuan, Sun, Maosong
Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decompos
Externí odkaz:
http://arxiv.org/abs/2406.08903
Autor:
Cai., Zefan, Zhang, Yichi, Gao, Bofei, Liu, Yuliang, Liu, Tianyu, Lu, Keming, Xiong, Wayne, Dong, Yue, Chang, Baobao, Hu, Junjie, Xiao, Wen
In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramida
Externí odkaz:
http://arxiv.org/abs/2406.02069
Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow the
Externí odkaz:
http://arxiv.org/abs/2404.08491
In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of e
Externí odkaz:
http://arxiv.org/abs/2403.06764
Autor:
Cai, Zefan, Kung, Po-Nien, Suvarna, Ashima, Ma, Mingyu Derek, Bansal, Hritik, Chang, Baobao, Brantingham, P. Jeffrey, Wang, Wei, Peng, Nanyun
Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types, and prompt them with unseen event definitions. These approaches yield sporadic successes, yet generally fall short of expectations. In
Externí odkaz:
http://arxiv.org/abs/2403.02586
Autor:
Chen, Liang, Zhang, Yichi, Ren, Shuhuai, Zhao, Haozhe, Cai, Zefan, Wang, Yuchi, Wang, Peiyi, Meng, Xiangdi, Liu, Tianyu, Chang, Baobao
We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-
Externí odkaz:
http://arxiv.org/abs/2402.15527
Autor:
Zhang, Rongyu, Cai, Zefan, Yang, Huanrui, Liu, Zidong, Gudovskiy, Denis, Okuno, Tomoyuki, Nakata, Yohei, Keutzer, Kurt, Chang, Baobao, Du, Yuan, Du, Li, Zhang, Shanghang
Finetuning a pretrained vision model (PVM) is a common technique for learning downstream vision tasks. However, the conventional finetuning process with randomly sampled data points results in diminished training efficiency. To address this drawback,
Externí odkaz:
http://arxiv.org/abs/2401.07853