Zobrazeno 1 - 10
of 8 467
pro vyhledávání: '"An, Zhihan"'
Multimodal Large Language Models (MLLMs) have demonstrated impressive abilities across various tasks, including visual question answering and chart comprehension, yet existing benchmarks for chart-related tasks fall short in capturing the complexity
Externí odkaz:
http://arxiv.org/abs/2410.14179
Best-of-N decoding methods instruct large language models (LLMs) to generate multiple solutions, score each using a scoring function, and select the highest scored as the final answer to mathematical reasoning problems. However, this repeated indepen
Externí odkaz:
http://arxiv.org/abs/2410.12934
We study the sample complexity of the prototypical tasks quantum purity estimation and quantum inner product estimation. In purity estimation, we are to estimate $tr(\rho^2)$ of an unknown quantum state $\rho$ to additive error $\epsilon$. Meanwhile,
Externí odkaz:
http://arxiv.org/abs/2410.12712
Despite the remarkable success of Large Language Models (LLMs), evaluating their outputs' quality regarding preference remains a critical challenge. Existing works usually leverage a powerful LLM (e.g., GPT4) as the judge for comparing LLMs' output p
Externí odkaz:
http://arxiv.org/abs/2410.12869
Autor:
Zhang, Shenao, Liu, Zhihan, Liu, Boyi, Zhang, Yufeng, Yang, Yingxiang, Liu, Yongfei, Chen, Liyu, Sun, Tao, Wang, Zhaoran
Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often overlook the
Externí odkaz:
http://arxiv.org/abs/2410.08067
Evaluating the ability of large language models (LLMs) to follow complex human-written instructions is essential for their deployment in real-world applications. While benchmarks like Chatbot Arena use human judges to assess model performance, they a
Externí odkaz:
http://arxiv.org/abs/2410.06089
Understanding quantum noise is an essential step towards building practical quantum information processing systems. Pauli noise is a useful model that has been widely applied in quantum benchmarking, error mitigation, and error correction. Despite in
Externí odkaz:
http://arxiv.org/abs/2410.03906
Autor:
Jia, Mengzhao, Yu, Wenhao, Ma, Kaixin, Fang, Tianqing, Zhang, Zhihan, Ouyang, Siru, Zhang, Hongming, Jiang, Meng, Yu, Dong
Text-rich images, where text serves as the central visual element guiding the overall understanding, are prevalent in real-world applications, such as presentation slides, scanned documents, and webpage snapshots. Tasks involving multiple text-rich i
Externí odkaz:
http://arxiv.org/abs/2410.01744
We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$-norm to measure function approximation e
Externí odkaz:
http://arxiv.org/abs/2410.01249
We address the challenge of online Reinforcement Learning from Human Feedback (RLHF) with a focus on self-rewarding alignment methods. In online RLHF, obtaining feedback requires interaction with the environment, which can be costly when using additi
Externí odkaz:
http://arxiv.org/abs/2409.17534