Zobrazeno 1 - 10
of 24
pro vyhledávání: '"Dimakopoulou, Maria"'
Contextual bandits are widely used in industrial personalization systems. These online learning frameworks learn a treatment assignment policy in the presence of treatment effects that vary with the observed contextual features of the users. While pe
Externí odkaz:
http://arxiv.org/abs/2205.04528
Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result
Externí odkaz:
http://arxiv.org/abs/2106.01723
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies
Externí odkaz:
http://arxiv.org/abs/2106.00418
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step. However, since the arms are adaptively selected--thereby yielding non-iid data
Externí odkaz:
http://arxiv.org/abs/2102.13202
We consider adaptive designs for a trial involving N individuals that we follow along T time steps. We allow for the variables of one individual to depend on its past and on the past of other individuals. Our goal is to learn a mean outcome, averaged
Externí odkaz:
http://arxiv.org/abs/2101.07380
Publikováno v:
International Conference on Machine Learning (2020)
We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squ
Externí odkaz:
http://arxiv.org/abs/1907.09623
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation prob
Externí odkaz:
http://arxiv.org/abs/1812.06227
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampl
Externí odkaz:
http://arxiv.org/abs/1805.08948
Autor:
Dimakopoulou, Maria, Van Roy, Benjamin
Publikováno v:
Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1271-1279, Stockholmsm\"assan, Stockholm Sweden, 10-15 Jul 2018
We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demon
Externí odkaz:
http://arxiv.org/abs/1802.01282
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation prob
Externí odkaz:
http://arxiv.org/abs/1711.07077