Výsledky vyhledávání - "Burke, Ciara"

Report

QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits

Autor: Howson, Benjamin, Filippi, Sarah, Pike-Burke, Ciara

We study the cooperative stochastic $k$-armed bandit problem, where a network of $m$ agents collaborate to find the optimal action. In contrast to most prior work on this problem, which focuses on extending a specific algorithm to the multi-agent set

Externí odkaz: http://arxiv.org/abs/2410.23867

Zobrazit plný text záznamu

Report

Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

Autor: Johnson, Emmeran, Pike-Burke, Ciara, Rebeschini, Patrick

We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $n$ to the environment that is polynomial in the dimension $d$ of the proble

Externí odkaz: http://arxiv.org/abs/2310.01616

Zobrazit plný text záznamu

Report

Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts

Autor: van der Hoeven, Dirk, Pike-Burke, Ciara, Qiu, Hao, Cesa-Bianchi, Nicolo

We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz

Externí odkaz: http://arxiv.org/abs/2307.00836

Zobrazit plný text záznamu

Report

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Autor: Johnson, Emmeran, Pike-Burke, Ciara, Rebeschini, Patrick

Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration (PI) with inexact policy evaluation, PMD algorithmical

Externí odkaz: http://arxiv.org/abs/2302.11381

Zobrazit plný text záznamu

Report

Delayed Feedback in Kernel Bandits

Autor: Vakili, Sattar, Ahmed, Danyal, Bernacchia, Alberto, Pike-Burke, Ciara

Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit prob

Externí odkaz: http://arxiv.org/abs/2302.00392

Zobrazit plný text záznamu

Report

Active Learning for Quantum Mechanical Measurements

Autor: Zhu, Ruidi, Pike-Burke, Ciara, Mintert, Florian

The experimental evaluation of many quantum mechanical quantities requires the estimation of several directly measurable observables, such as local observables. Due to the necessity to repeat experiments on individual quantum systems in order to esti

Externí odkaz: http://arxiv.org/abs/2212.07513

Zobrazit plný text záznamu

Report

Delayed Feedback in Generalised Linear Bandits Revisited

Autor: Howson, Benjamin, Pike-Burke, Ciara, Filippi, Sarah

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewa

Externí odkaz: http://arxiv.org/abs/2207.10786

Zobrazit plný text záznamu

Report

Bandit problems with fidelity rewards

Autor: Lugosi, Gábor, Pike-Burke, Ciara, Savalle, Pierre-André

The fidelity bandits problem is a variant of the $K$-armed bandit problem in which the reward of each arm is augmented by a fidelity reward that provides the player with an additional payoff depending on how 'loyal' the player has been to that arm in

Externí odkaz: http://arxiv.org/abs/2111.13026

Zobrazit plný text záznamu

Report

Optimism and Delays in Episodic Reinforcement Learning

Autor: Howson, Benjamin, Pike-Burke, Ciara, Filippi, Sarah

There are many algorithms for regret minimisation in episodic reinforcement learning. This problem is well-understood from a theoretical perspective, providing that the sequences of states, actions and rewards associated with each episode are availab

Externí odkaz: http://arxiv.org/abs/2111.07615

Zobrazit plný text záznamu

Report

Local Differential Privacy for Regret Minimization in Reinforcement Learning

Autor: Garcelon, Evrard, Perchet, Vianney, Pike-Burke, Ciara, Pirotta, Matteo

Reinforcement learning algorithms are widely used in domains where it is desirable to provide a personalized service. In these domains it is common that user data contains sensitive information that needs to be protected from third parties. Motivated

Externí odkaz: http://arxiv.org/abs/2010.07778

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání