Zobrazeno 1 - 10
of 196
pro vyhledávání: '"Foerster, Jakob"'
Given the widespread adoption and usage of Large Language Models (LLMs), it is crucial to have flexible and interpretable evaluations of their instruction-following ability. Preference judgments between model outputs have become the de facto evaluati
Externí odkaz:
http://arxiv.org/abs/2410.03608
Autor:
Towers, Sebastian, Kalisz, Aleksandra, Robert, Philippe A., Higueruelo, Alicia, Vianello, Francesca, Tsai, Ming-Han Chloe, Steel, Harrison, Foerster, Jakob N.
Anti-viral therapies are typically designed to target only the current strains of a virus. Game theoretically, this corresponds to a short-sighted, or myopic, response. However, therapy-induced selective pressures act on viruses to drive the emergenc
Externí odkaz:
http://arxiv.org/abs/2409.10588
Autor:
Lupidi, Alisia, Gemmell, Carlos, Cancedda, Nicola, Dwivedi-Yu, Jane, Weston, Jason, Foerster, Jakob, Raileanu, Roberta, Lomeli, Maria
Large Language Models still struggle in challenging scenarios that leverage structured data, complex reasoning, or tool usage. In this paper, we propose Source2Synth: a new method that can be used for teaching LLMs new skills without relying on costl
Externí odkaz:
http://arxiv.org/abs/2409.08239
Human intelligence emerged through the process of natural selection and evolution on Earth. We investigate what it would take to re-create this process in silico. While past work has often focused on low-level processes (such as simulating physics or
Externí odkaz:
http://arxiv.org/abs/2409.00853
Autor:
Rutherford, Alexander, Beukman, Michael, Willi, Timon, Lacerda, Bruno, Hawes, Nick, Foerster, Jakob
What data or environments to use for training to improve downstream performance is a longstanding and very topical question in reinforcement learning. In particular, Unsupervised Environment Design (UED) methods have gained recent attention as their
Externí odkaz:
http://arxiv.org/abs/2408.15099
Autor:
Zhang, Qizhen, Gritsch, Nikolas, Gnaneshwar, Dwaraknath, Guo, Simon, Cairuz, David, Venkitesh, Bharat, Foerster, Jakob, Blunsom, Phil, Ruder, Sebastian, Ustun, Ahmet, Locatelli, Acyr
The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing met
Externí odkaz:
http://arxiv.org/abs/2408.08274
One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstor
Externí odkaz:
http://arxiv.org/abs/2408.06292
Autor:
Goldie, Alexander David, Lu, Chris, Jackson, Matthew Thomas, Whiteson, Shimon, Foerster, Jakob Nicolaus
While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degr
Externí odkaz:
http://arxiv.org/abs/2407.07082
Autor:
Gallici, Matteo, Fellows, Mattie, Ellis, Benjamin, Pou, Bartomeu, Masmitja, Ivan, Foerster, Jakob Nicolaus, Martin, Mario
Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks to stabil
Externí odkaz:
http://arxiv.org/abs/2407.04811
Autor:
Willi, Timon, Obando-Ceron, Johan, Foerster, Jakob, Dziugaite, Karolina, Castro, Pablo Samuel
Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep
Externí odkaz:
http://arxiv.org/abs/2406.18420