Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Mukherjee, Subhojyoti"'
Learning from human feedback has been central to recent advances in artificial intelligence and machine learning. Since the collection of human feedback is costly, a natural question to ask is if the new feedback always needs to collected. Or could w
Externí odkaz:
http://arxiv.org/abs/2406.10030
In this paper, we study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. The tasks share a common structure and the algorithm exploits the shared structure to minimize the cumu
Externí odkaz:
http://arxiv.org/abs/2406.05064
In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will ob
Externí odkaz:
http://arxiv.org/abs/2406.02165
Autor:
Mukherjee, Subhojyoti, Lalitha, Anusha, Kalantari, Kousha, Deshmukh, Aniket, Liu, Ge, Ma, Yifei, Kveton, Branislav
Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by the cost of obtaining high-quality human annotations, we study the problem of data collection for learning preference model
Externí odkaz:
http://arxiv.org/abs/2404.13895
Autor:
Mukherjee, Subhojyoti, Lalitha, Anusha, Deshmukh, Aniket, Liu, Ge, Ma, Yifei, Kveton, Branislav
One emergent ability of large language models (LLMs) is that query-specific examples can be included in the prompt at inference time. In this work, we use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD).
Externí odkaz:
http://arxiv.org/abs/2404.08846
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known fe
Externí odkaz:
http://arxiv.org/abs/2311.00327
Motivated by the importance of explainability in modern machine learning, we design bandit algorithms that are efficient and interpretable. A bandit algorithm is interpretable if it explores with the objective of reducing uncertainty in the unknown m
Externí odkaz:
http://arxiv.org/abs/2310.14751
In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit
Externí odkaz:
http://arxiv.org/abs/2301.12357
Autor:
Mukherjee, Subhojyoti
In this paper, we consider the setting of piecewise i.i.d. bandits under a safety constraint. In this piecewise i.i.d. setting, there exists a finite number of changepoints where the mean of some or all arms change simultaneously. We introduce the sa
Externí odkaz:
http://arxiv.org/abs/2205.13689
This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs). In policy evaluation, we are given a target policy and asked to estimate the expected cumulative reward it will obtain in an environment form
Externí odkaz:
http://arxiv.org/abs/2203.04510