Výsledky vyhledávání

Report

MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

Autor: Zhu, Zifeng, Jia, Mengzhao, Zhang, Zhihan, Li, Lang, Jiang, Meng

Multimodal Large Language Models (MLLMs) have demonstrated impressive abilities across various tasks, including visual question answering and chart comprehension, yet existing benchmarks for chart-related tasks fall short in capturing the complexity

Externí odkaz: http://arxiv.org/abs/2410.14179

Zobrazit plný text záznamu

Report

Enhancing Mathematical Reasoning in LLMs by Stepwise Correction

Autor: Wu, Zhenyu, Zeng, Qingkai, Zhang, Zhihan, Tan, Zhaoxuan, Shen, Chao, Jiang, Meng

Best-of-N decoding methods instruct large language models (LLMs) to generate multiple solutions, score each using a scoring function, and select the highest scored as the final answer to mathematical reasoning problems. However, this repeated indepen

Externí odkaz: http://arxiv.org/abs/2410.12934

Zobrazit plný text záznamu

Report

On the sample complexity of purity and inner product estimation

Autor: Gong, Weiyuan, Haferkamp, Jonas, Ye, Qi, Zhang, Zhihan

We study the sample complexity of the prototypical tasks quantum purity estimation and quantum inner product estimation. In purity estimation, we are to estimate $tr(\rho^2)$ of an unknown quantum state $\rho$ to additive error $\epsilon$. Meanwhile,

Externí odkaz: http://arxiv.org/abs/2410.12712

Zobrazit plný text záznamu

Report

Language Model Preference Evaluation with Multiple Weak Evaluators

Autor: Hu, Zhengyu, Zhang, Jieyu, Xiong, Zhihan, Ratner, Alexander, Xiong, Hui, Krishna, Ranjay

Despite the remarkable success of Large Language Models (LLMs), evaluating their outputs' quality regarding preference remains a critical challenge. Existing works usually leverage a powerful LLM (e.g., GPT4) as the judge for comparing LLMs' output p

Externí odkaz: http://arxiv.org/abs/2410.12869

Zobrazit plný text záznamu

Report

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Autor: Zhang, Shenao, Liu, Zhihan, Liu, Boyi, Zhang, Yufeng, Yang, Yingxiang, Liu, Yongfei, Chen, Liyu, Sun, Tao, Wang, Zhaoran

Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often overlook the

Externí odkaz: http://arxiv.org/abs/2410.08067

Zobrazit plný text záznamu

Report

TOWER: Tree Organized Weighting for Evaluating Complex Instructions

Autor: Ziems, Noah, Zhang, Zhihan, Jiang, Meng

Evaluating the ability of large language models (LLMs) to follow complex human-written instructions is essential for their deployment in real-world applications. While benchmarks like Chatbot Arena use human judges to assess model performance, they a

Externí odkaz: http://arxiv.org/abs/2410.06089

Zobrazit plný text záznamu

Report

Efficient self-consistent learning of gate set Pauli noise

Autor: Chen, Senrui, Zhang, Zhihan, Jiang, Liang, Flammia, Steven T.

Understanding quantum noise is an essential step towards building practical quantum information processing systems. Pauli noise is a useful model that has been widely applied in quantum benchmarking, error mitigation, and error correction. Despite in

Externí odkaz: http://arxiv.org/abs/2410.03906

Zobrazit plný text záznamu

Report

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

Autor: Jia, Mengzhao, Yu, Wenhao, Ma, Kaixin, Fang, Tianqing, Zhang, Zhihan, Ouyang, Siru, Zhang, Hongming, Jiang, Meng, Yu, Dong

Text-rich images, where text serves as the central visual element guiding the overall understanding, are prevalent in real-world applications, such as presentation slides, scanned documents, and webpage snapshots. Tasks involving multiple text-rich i

Externí odkaz: http://arxiv.org/abs/2410.01744

Zobrazit plný text záznamu

Report

Dual Approximation Policy Optimization

Autor: Xiong, Zhihan, Fazel, Maryam, Xiao, Lin

We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$-norm to measure function approximation e

Externí odkaz: http://arxiv.org/abs/2410.01249

Zobrazit plný text záznamu

Report

Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization

Autor: Xu, Ruijie, Liu, Zhihan, Liu, Yongfei, Yan, Shipeng, Wang, Zhaoran, Zhang, Zhi, He, Xuming

We address the challenge of online Reinforcement Learning from Human Feedback (RLHF) with a focus on self-rewarding alignment methods. In online RLHF, obtaining feedback requires interaction with the environment, which can be costly when using additi

Externí odkaz: http://arxiv.org/abs/2409.17534

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání