Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Ying, Donghao"'
Bagging is a popular ensemble technique to improve the accuracy of machine learning models. It hinges on the well-established rationale that, by repeatedly retraining on resampled data, the aggregated model exhibits lower variance and hence higher st
Externí odkaz:
http://arxiv.org/abs/2405.14741
We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. The objective and constraints are described by {\it general utiliti
Externí odkaz:
http://arxiv.org/abs/2305.17568
This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lac
Externí odkaz:
http://arxiv.org/abs/2305.17567
Stochastic time-varying optimization is an integral part of learning in which the shape of the function changes over time in a non-deterministic manner. This paper considers multiple models of stochastic time variation and analyzes the corresponding
Externí odkaz:
http://arxiv.org/abs/2302.11190
We study the scalable multi-agent reinforcement learning (MARL) with general utilities, defined as nonlinear functions of the team's long-term state-action occupancy measure. The objective is to find a localized policy that maximizes the average of t
Externí odkaz:
http://arxiv.org/abs/2302.07938
We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algori
Externí odkaz:
http://arxiv.org/abs/2205.10715
We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By l
Externí odkaz:
http://arxiv.org/abs/2110.08923
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and the constraints are convex in the state-action occupancy measure. We propose a policy-based primal-dual algorithm that updates the primal variable via
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e98356b75dd69fdb913dca89092b3eb4
http://arxiv.org/abs/2205.10715
http://arxiv.org/abs/2205.10715