Zobrazeno 1 - 10
of 160
pro vyhledávání: '"Gast, Nicolas"'
Autor:
Gast, Nicolas, Narasimha, Dheeraj
We consider the discrete time infinite horizon average reward restless markovian bandit (RMAB) problem. We propose a \emph{model predictive control} based non-stationary policy with a rolling computational horizon $\tau$. At each time-slot, this poli
Externí odkaz:
http://arxiv.org/abs/2410.06307
We explore a novel variant of the classical prophet inequality problem, where the values of a sequence of items are drawn i.i.d. from some distribution, and an online decision maker must select one item irrevocably. We establish that the competitive
Externí odkaz:
http://arxiv.org/abs/2408.07616
Autor:
Allmeier, Sebastian, Gast, Nicolas
We study stochastic approximation algorithms with Markovian noise and constant step-size $\alpha$. We develop a method based on infinitesimal generator comparisons to study the bias of the algorithm, which is the expected difference between $\theta_n
Externí odkaz:
http://arxiv.org/abs/2405.14285
Autor:
Allmeier, Sebastian, Gast, Nicolas
We consider a system of $N$ particles whose interactions are characterized by a (weighted) graph $G^N$. Each particle is a node of the graph with an internal state. The state changes according to Markovian dynamics that depend on the states and conne
Externí odkaz:
http://arxiv.org/abs/2405.08623
We consider the problem of online allocation subject to a long-term fairness penalty. Contrary to existing works, however, we do not assume that the decision-maker observes the protected attributes -- which is often unrealistic in practice. Instead t
Externí odkaz:
http://arxiv.org/abs/2306.13440
We propose the first model-free algorithm that achieves low regret performance for decentralized learning in two-player zero-sum tabular stochastic games with infinite-horizon average-reward objective. In decentralized learning, the learning agent co
Externí odkaz:
http://arxiv.org/abs/2301.05630
Autor:
Allmeier, Sebastian, Gast, Nicolas
Mean field approximation is a powerful technique which has been used in many settings to study large-scale stochastic systems. In the case of two-timescale systems, the approximation is obtained by a combination of scaling arguments and the use of th
Externí odkaz:
http://arxiv.org/abs/2211.11382
We propose a new policy, called the LP-update policy, to solve finite horizon weakly-coupled Markov decision processes. The latter can be seen as multi-constraint multi-action bandits, and generalize the classical restless bandit problems. Our soluti
Externí odkaz:
http://arxiv.org/abs/2211.01961
To better understand discriminations and the effect of affirmative actions in selection problems (e.g., college admission or hiring), a recent line of research proposed a model based on differential variance. This model assumes that the decision-make
Externí odkaz:
http://arxiv.org/abs/2205.12204
Whittle index is a generalization of Gittins index that provides very efficient allocation rules for restless multi-armed bandits. In this work, we develop an algorithm to test the indexability and compute the Whittle indices of any finite-state rest
Externí odkaz:
http://arxiv.org/abs/2203.05207