Výsledky vyhledávání

Report

Stochastic Approximation with Two Time Scales: The General Case

Autor: Borkar, Vivek S

Two time scale stochastic approximation is analyzed when the iterates on either or both time scales do not necessarily converge.
Comment: 6 pages

Externí odkaz: http://arxiv.org/abs/2412.19872

Zobrazit plný text záznamu

Report

Lagrangian Index Policy for Restless Bandits with Average Reward

Autor: Avrachenkov, Konstantin, Borkar, Vivek S., Shah, Pratik

We study the Lagrangian Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asy

Externí odkaz: http://arxiv.org/abs/2412.12641

Zobrazit plný text záznamu

Report

Whittle Index Based User Association in Dense Millimeter Wave Networks

Autor: Nalavade, Mandar R., Kasbekar, Gaurav S., Borkar, Vivek S.

We address the problem of user association in a dense millimeter wave (mmWave) network, in which each arriving user brings a file containing a random number of packets and each time slot is divided into multiple mini-slots. This problem is an instanc

Externí odkaz: http://arxiv.org/abs/2403.09279

Zobrazit plný text záznamu

Report

A Concentration Bound for TD(0) with Function Approximation

Autor: Chandak, Siddharth, Borkar, Vivek S.

We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our anal

Externí odkaz: http://arxiv.org/abs/2312.10424

Zobrazit plný text záznamu

Report

Approximation of Convex Envelope Using Reinforcement Learning

Autor: Borkar, Vivek S., Akarsh, Adit

Oberman gave a stochastic control formulation of the problem of estimating the convex envelope of a non-convex function. Based on this, we develop a reinforcement learning scheme to approximate the convex envelope, using a variant of Q-learning for c

Externí odkaz: http://arxiv.org/abs/2311.14421

Zobrazit plný text záznamu

Report

Decentralised Q-Learning for Multi-Agent Markov Decision Processes with a Satisfiability Criterion

Autor: Keval, Keshav P., Borkar, Vivek S.

In this paper, we propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP). The goal, inspired by Blackwell's Approachability Theorem, is to lower the time average cost of each agent to below a pre-specified ag

Externí odkaz: http://arxiv.org/abs/2311.12613

Zobrazit plný text záznamu

Report

Node Cardinality Estimation in the Internet of Things Using Privileged Feature Distillation

Autor: Page, Pranav S., Siyote, Anand S., Borkar, Vivek S., Kasbekar, Gaurav S.

The Internet of Things (IoT) is emerging as a critical technology to connect resource-constrained devices such as sensors and actuators as well as appliances to the Internet. In this paper, we propose a novel methodology for node cardinality estimati

Externí odkaz: http://arxiv.org/abs/2310.18664

Zobrazit plný text záznamu

Report

Controlled Martingale Problems And Their Markov Mimics

Autor: Athreya, Siva, Borkar, Vivek S., Gadhiwala, Nitya

In this article we prove under suitable assumptions that the marginals of any solution to a relaxed controlled martingale problem on a Polish space $E$ can be mimicked by a Markovian solution of a Markov-relaxed controlled martingale problem. We also

Externí odkaz: http://arxiv.org/abs/2309.00488

Zobrazit plný text záznamu

Report

Ergodic Risk-sensitive control -- A survey

Autor: Biswas, Anup, Borkar, Vivek S.

Risk-sensitive control has received considerable interest since the seminal work of Howard and Matheson [120] because of its ability to account for fluctuations about the mean, its connection with $H_\infty$ control, and its application to financial

Externí odkaz: http://arxiv.org/abs/2301.00224

Zobrazit plný text záznamu

Report

Reinforcement Learning in Non-Markovian Environments

Autor: Chandak, Siddharth, Shah, Pratik, Borkar, Vivek S, Dodhia, Parth

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when

Externí odkaz: http://arxiv.org/abs/2211.01595

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání