Zobrazeno 1 - 10
of 32
pro vyhledávání: '"Cayci, Semih"'
Autor:
Müller, Johannes, Cayci, Semih
We study the error introduced by entropy regularization of infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in
Externí odkaz:
http://arxiv.org/abs/2406.04163
Autor:
Cayci, Semih, Eryilmaz, Atilla
In this paper, we study a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes, whereby RNNs are used for policy parameterization and policy evaluation to address curse of dimensi
Externí odkaz:
http://arxiv.org/abs/2405.18221
Kakade's natural policy gradient method has been studied extensively in the last years showing linear convergence with and without regularization. We study another natural gradient method which is based on the Fisher information matrix of the state-a
Externí odkaz:
http://arxiv.org/abs/2403.19448
Autor:
Cayci, Semih, Eryilmaz, Atilla
We analyze recurrent neural networks with diagonal hidden-to-hidden weight matrices, trained with gradient descent in the supervised learning setting, and prove that gradient descent can achieve optimality \emph{without} massive overparameterization.
Externí odkaz:
http://arxiv.org/abs/2402.12241
Autor:
Cayci, Semih, Eryilmaz, Atilla
In a broad class of reinforcement learning applications, stochastic rewards have heavy-tailed distributions, which lead to infinite second-order moments for stochastic (semi)gradients in policy evaluation and direct policy optimization. In such insta
Externí odkaz:
http://arxiv.org/abs/2306.11455
Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative
Externí odkaz:
http://arxiv.org/abs/2212.14449
Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces. In this paper, we present a finite-tim
Externí odkaz:
http://arxiv.org/abs/2206.00833
We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Marko
Externí odkaz:
http://arxiv.org/abs/2202.09753
Autor:
Cayci, Semih
With the rapid advances in device technology and computational resources, the performance of communication and computing systems achieved a massive breakthrough recently. Proportional to these improvements, new services developed on these systems req
In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by a stringent budget constraint on the available resources, which are consumed in a random amount by each acti
Externí odkaz:
http://arxiv.org/abs/2106.05165