Zobrazeno 1 - 10
of 33
pro vyhledávání: '"Gajane, Pratik"'
Autor:
Gajane, Pratik
We introduce the problem of regret minimization in adversarial multi-dueling bandits. While adversarial preferences have been studied in dueling bandits, they have not been explored in multi-dueling bandits. In this setting, the learner is required t
Externí odkaz:
http://arxiv.org/abs/2406.12475
Chronic pain significantly diminishes the quality of life for millions worldwide. While psychoeducation and therapy can improve pain outcomes, many individuals experiencing pain lack access to evidence-based treatments or fail to complete the necessa
Externí odkaz:
http://arxiv.org/abs/2402.19226
We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically c
Externí odkaz:
http://arxiv.org/abs/2309.15737
Autor:
Broek, Ronald C. van den, Litjens, Rik, Sagis, Tobias, Siecker, Luc, Verbeeke, Nina, Gajane, Pratik
Decision-making problems of sequential nature, where decisions made in the past may have an impact on the future, are used to model many practically important applications. In some real-world applications, feedback about a decision is delayed and may
Externí odkaz:
http://arxiv.org/abs/2303.00620
Autor:
Li, Jiong, Gajane, Pratik
Sparsity of rewards while applying a deep reinforcement learning method negatively affects its sample-efficiency. A viable solution to deal with the sparsity of rewards is to learn via intrinsic motivation which advocates for adding an intrinsic rewa
Externí odkaz:
http://arxiv.org/abs/2302.10825
Autor:
Gajane, Pratik
We study the problem of preserving privacy while still providing high utility in sequential decision making scenarios in a changing environment. We consider abruptly changing environment: the environment remains constant during periods and it changes
Externí odkaz:
http://arxiv.org/abs/2301.00561
Autor:
Broek, Ronald C. van den, Litjens, Rik, Sagis, Tobias, Siecker, Luc, Verbeeke, Nina, Gajane, Pratik
We investigate the Multi-Armed Bandit problem with Temporally-Partitioned Rewards (TP-MAB) setting in this paper. In the TP-MAB setting, an agent will receive subsets of the reward over multiple rounds rather than the entire reward for the arm all at
Externí odkaz:
http://arxiv.org/abs/2211.06883
We study a posterior sampling approach to efficient exploration in constrained reinforcement learning. Alternatively to existing algorithms, we propose two simple algorithms that are more efficient statistically, simpler to implement and computationa
Externí odkaz:
http://arxiv.org/abs/2209.03596
Fairness-aware learning aims at satisfying various fairness constraints in addition to the usual performance criteria via data-driven machine learning techniques. Most of the research in fairness-aware learning employs the setting of fair-supervised
Externí odkaz:
http://arxiv.org/abs/2205.10032
We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a more practically relevant batch-centric scenario of batch learning.
Externí odkaz:
http://arxiv.org/abs/2202.06657