Zobrazeno 1 - 10
of 354
pro vyhledávání: '"Shalabh Bhatnagar"'
Publikováno v:
Mathematics of Operations Research. 47:2138-2159
In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart the approximate value iteration (AVI)
Publikováno v:
IEEE Transactions on Automatic Control. 67:4241-4247
Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the op
Autor:
Prasenjit Karmakar, Shalabh Bhatnagar
Publikováno v:
IEEE Transactions on Automatic Control. 66:5941-5954
This paper compiles several aspects of the dynamics of stochastic approximation algorithms with Markov iterate-dependent noise when the iterates are not known to be stable beforehand. We achieve the same by extending the lock-in probability (i.e. the
Publikováno v:
2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC).
Publikováno v:
IEEE Transactions on Intelligent Transportation Systems. 22:107-118
This paper presents our method for enabling a UAV quadrotor, equipped with a monocular camera, to autonomously avoid collisions with obstacles in unstructured and unknown indoor environments. When compared to obstacle avoidance in ground vehicular ro
Publikováno v:
Applied Intelligence. 51:1565-1579
Zeroth Order Bayesian Optimization (ZOBO) methods optimize an unknown function based on its black-box evaluations at the query locations. Unlike most optimization procedures, ZOBO methods fail to utilize gradient information even when it is available
Publikováno v:
IEEE Control Systems Letters. 4:524-529
In this paper, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was proposed in the Reinforcement Learning (RL) literature to handle slow convergence of Watkins' Q-learning. In most RL algorithms such as Q-learning, the Bellma
Publikováno v:
Applied Intelligence. 50:3590-3606
Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic appli
Publikováno v:
IEEE Control Systems Letters. 4:55-60
In a discounted reward Markov decision process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equatio
Publikováno v:
Algorithms for Intelligent Systems ISBN: 9789811696497
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::fe0fdbe048e45588edeeab386747560e
https://doi.org/10.1007/978-981-16-9650-3_4
https://doi.org/10.1007/978-981-16-9650-3_4