Zobrazeno 1 - 10
of 1 652
pro vyhledávání: '"P Van Roy"'
Autor:
Marklund, Henrik, Van Roy, Benjamin
As AI agents generate increasingly sophisticated behaviors, manually encoding human preferences to guide these agents becomes more challenging. To address this, it has been suggested that agents instead learn preferences from human choice data. This
Externí odkaz:
http://arxiv.org/abs/2410.22690
Autor:
Jeon, Hong Jun, Van Roy, Benjamin
The staggering feats of AI systems have brought to attention the topic of AI Alignment: aligning a "superintelligent" AI agent's actions with humanity's interests. Many existing frameworks/algorithms in alignment study the problem on a myopic horizon
Externí odkaz:
http://arxiv.org/abs/2410.14807
The "small agent, big world" frame offers a conceptual view that motivates the need for continual learning. The idea is that a small agent operating in a much bigger world cannot store all information that the world has to offer. To perform well, the
Externí odkaz:
http://arxiv.org/abs/2408.02930
Autor:
Jeon, Hong Jun, Van Roy, Benjamin
The staggering progress of machine learning in the past decade has been a sight to behold. In retrospect, it is both remarkable and unsettling that these milestones were achievable with little to no rigorous theory to guide experimentation. Despite t
Externí odkaz:
http://arxiv.org/abs/2407.12288
A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, a
Externí odkaz:
http://arxiv.org/abs/2407.12185
A sequential decision-making agent balances between exploring to gain new knowledge about an environment and exploiting current knowledge to maximize immediate reward. For environments studied in the traditional literature, optimal decisions gravitat
Externí odkaz:
http://arxiv.org/abs/2407.12178
Autor:
Jeon, Hong Jun, Van Roy, Benjamin
Neural scaling laws aim to characterize how out-of-sample error behaves as a function of model and training dataset size. Such scaling laws guide allocation of a computational resources between model and data processing to minimize error. However, ex
Externí odkaz:
http://arxiv.org/abs/2407.01456
For continuous decision spaces, nonlinear programs (NLPs) can be efficiently solved via sequential quadratic programming (SQP) and, more generally, sequential convex programming (SCP). These algorithms linearize only the nonlinear equality constraint
Externí odkaz:
http://arxiv.org/abs/2404.11786
We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our
Externí odkaz:
http://arxiv.org/abs/2402.00396
Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted. We introduce new information-theoretic tools that lead to an elegant and very general decomposition of error into three
Externí odkaz:
http://arxiv.org/abs/2401.15530