Zobrazeno 1 - 10
of 1 359 689
pro vyhledávání: '"A P, Best"'
Autor:
Agrawal, Sanjana, Blanco, Saúl A.
We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a lin
Externí odkaz:
http://arxiv.org/abs/2411.13690
Autor:
Reuel, Anka, Hardy, Amelia, Smith, Chandler, Lamparth, Max, Hardy, Malcolm, Kochenderfer, Mykel J.
AI models are increasingly prevalent in high-stakes environments, necessitating thorough assessment of their capabilities and risks. Benchmarks are popular for measuring these attributes and for comparing model performance, tracking progress, and ide
Externí odkaz:
http://arxiv.org/abs/2411.12990
Given a multi-view video, which viewpoint is most informative for a human observer? Existing methods rely on heuristics or expensive ``best-view" supervision to answer this question, limiting their applicability. We propose a weakly supervised approa
Externí odkaz:
http://arxiv.org/abs/2411.08753
A mean-field game (MFG) seeks the Nash Equilibrium of a game involving a continuum of players, where the Nash Equilibrium corresponds to a fixed point of the best-response mapping. However, simple fixed-point iterations do not always guarantee conver
Externí odkaz:
http://arxiv.org/abs/2411.07989
The best arm identification problem requires identifying the best alternative (i.e., arm) in active experimentation using the smallest number of experiments (i.e., arm pulls), which is crucial for cost-efficient and timely decision-making processes.
Externí odkaz:
http://arxiv.org/abs/2411.01808
Autor:
Caldeira, Madalena, Moreno, Plinio
The Next Best View problem is a computer vision problem widely studied in robotics. To solve it, several methodologies have been proposed over the years. Some, more recently, propose the use of deep learning models. Predictions obtained with the help
Externí odkaz:
http://arxiv.org/abs/2411.01734
Autor:
Sun, Hanshi, Haider, Momin, Zhang, Ruiqi, Yang, Huitao, Qiu, Jiahao, Yin, Ming, Wang, Mengdi, Bartlett, Peter, Zanette, Andrea
The safe and effective deployment of Large Language Models (LLMs) involves a critical step called alignment, which ensures that the model's responses are in accordance with human preferences. Prevalent alignment techniques, such as DPO, PPO and their
Externí odkaz:
http://arxiv.org/abs/2410.20290
Autor:
Qiu, Jiahao, Lu, Yifu, Zeng, Yifan, Guo, Jiacheng, Geng, Jiayi, Wang, Huazheng, Huang, Kaixuan, Wu, Yue, Wang, Mengdi
Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a
Externí odkaz:
http://arxiv.org/abs/2410.16033