Zobrazeno 1 - 10
of 197
pro vyhledávání: '"Shalabh Bhatnagar"'
Publikováno v:
Mathematics of Operations Research. 47:2138-2159
In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart the approximate value iteration (AVI)
Publikováno v:
IEEE Transactions on Automatic Control. 67:4241-4247
Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the op
Autor:
Prasenjit Karmakar, Shalabh Bhatnagar
Publikováno v:
IEEE Transactions on Automatic Control. 66:5941-5954
This paper compiles several aspects of the dynamics of stochastic approximation algorithms with Markov iterate-dependent noise when the iterates are not known to be stable beforehand. We achieve the same by extending the lock-in probability (i.e. the
Publikováno v:
2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC).
Publikováno v:
IEEE Transactions on Intelligent Transportation Systems. 22:107-118
This paper presents our method for enabling a UAV quadrotor, equipped with a monocular camera, to autonomously avoid collisions with obstacles in unstructured and unknown indoor environments. When compared to obstacle avoidance in ground vehicular ro
Publikováno v:
Algorithms for Intelligent Systems ISBN: 9789811696497
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::fe0fdbe048e45588edeeab386747560e
https://doi.org/10.1007/978-981-16-9650-3_4
https://doi.org/10.1007/978-981-16-9650-3_4
Publikováno v:
Lecture Notes in Computer Science ISBN: 9783031229527
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::dc8384f4865dff819f36ee5b42154e84
https://doi.org/10.1007/978-3-031-22953-4_6
https://doi.org/10.1007/978-3-031-22953-4_6
Publikováno v:
Proceedings of the 14th International Conference on Agents and Artificial Intelligence.
Publikováno v:
Applied Intelligence. 51:1565-1579
Zeroth Order Bayesian Optimization (ZOBO) methods optimize an unknown function based on its black-box evaluations at the query locations. Unlike most optimization procedures, ZOBO methods fail to utilize gradient information even when it is available
Publikováno v:
IEEE Control Systems Letters. 4:524-529
In this paper, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was proposed in the Reinforcement Learning (RL) literature to handle slow convergence of Watkins' Q-learning. In most RL algorithms such as Q-learning, the Bellma