Zobrazeno 1 - 10
of 46
pro vyhledávání: '"Asghari, Seyed Mohammad"'
We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our
Externí odkaz:
http://arxiv.org/abs/2402.00396
Autor:
Osband, Ian, Wen, Zheng, Asghari, Seyed Mohammad, Dwaracherla, Vikranth, Ibrahimi, Morteza, Lu, Xiuyuan, Van Roy, Benjamin
Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural network
Externí odkaz:
http://arxiv.org/abs/2302.09205
Autor:
Osband, Ian, Asghari, Seyed Mohammad, Van Roy, Benjamin, McAleese, Nat, Aslanides, John, Irving, Geoffrey
Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data. However, typical fine-tuning schemes do not prioritize the examples that they tune on. We show that, if you can prioritize informativ
Externí odkaz:
http://arxiv.org/abs/2211.01568
Autor:
Lu, Xiuyuan, Osband, Ian, Asghari, Seyed Mohammad, Gowal, Sven, Dwaracherla, Vikranth, Wen, Zheng, Van Roy, Benjamin
Recent work introduced the epinet as a new approach to uncertainty modeling in deep learning. An epinet is a small neural network added to traditional neural networks, which, together, can produce predictive distributions. In particular, using an epi
Externí odkaz:
http://arxiv.org/abs/2207.00137
Autor:
Dwaracherla, Vikranth, Wen, Zheng, Osband, Ian, Lu, Xiuyuan, Asghari, Seyed Mohammad, Van Roy, Benjamin
In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches have been p
Externí odkaz:
http://arxiv.org/abs/2206.03633
Autor:
Osband, Ian, Wen, Zheng, Asghari, Seyed Mohammad, Dwaracherla, Vikranth, Lu, Xiuyuan, Van Roy, Benjamin
Most work on supervised learning research has focused on marginal predictions. In decision problems, joint predictive distributions are essential for good performance. Previous work has developed methods for assessing low-order predictive distributio
Externí odkaz:
http://arxiv.org/abs/2202.13509
Autor:
Osband, Ian, Wen, Zheng, Asghari, Seyed Mohammad, Dwaracherla, Vikranth, Hao, Botao, Ibrahimi, Morteza, Lawson, Dieterich, Lu, Xiuyuan, O'Donoghue, Brendan, Van Roy, Benjamin
Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed a
Externí odkaz:
http://arxiv.org/abs/2110.04629
Autor:
Osband, Ian, Wen, Zheng, Asghari, Seyed Mohammad, Dwaracherla, Vikranth, Ibrahimi, Morteza, Lu, Xiuyuan, Van Roy, Benjamin
Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. In principle, ensemble-based approaches produce effective joint predicti
Externí odkaz:
http://arxiv.org/abs/2107.08924
Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in mul
Externí odkaz:
http://arxiv.org/abs/2001.10122
We consider a system comprising a file library and a network with a server and multiple users equipped with cache memories. The system operates in two phases: a prefetching phase, where users load their caches with parts of contents from the library,
Externí odkaz:
http://arxiv.org/abs/1912.04321