Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Kwon, Jeongyeol"'
Previous studies on two-timescale stochastic approximation (SA) mainly focused on bounding mean-squared errors under diminishing stepsize schemes. In this work, we investigate {\it constant} stpesize schemes through the lens of Markov processes, prov
Externí odkaz:
http://arxiv.org/abs/2410.13067
In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is
Externí odkaz:
http://arxiv.org/abs/2406.01389
Learning a good history representation is one of the core challenges of reinforcement learning (RL) in partially observable environments. Recent works have shown the advantages of various auxiliary tasks for facilitating representation learning. Howe
Externí odkaz:
http://arxiv.org/abs/2402.07102
We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of
Externí odkaz:
http://arxiv.org/abs/2402.07101
In many interactive decision-making settings, there is latent and unobserved information that remains fixed. Consider, for example, a dialogue system, where complete information about a user, such as the user's preferences, is not given. In such an e
Externí odkaz:
http://arxiv.org/abs/2310.07596
In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the l
Externí odkaz:
http://arxiv.org/abs/2309.01753
Modern machine learning models deployed in the wild can encounter both covariate and semantic shifts, giving rise to the problems of out-of-distribution (OOD) generalization and OOD detection respectively. While both problems have received significan
Externí odkaz:
http://arxiv.org/abs/2306.09158
We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend to require
Externí odkaz:
http://arxiv.org/abs/2301.10945
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the epi
Externí odkaz:
http://arxiv.org/abs/2210.02594
We consider a multi-armed bandit problem with $M$ latent contexts, where an agent interacts with the environment for an episode of $H$ time steps. Depending on the length of the episode, the learner may not be able to estimate accurately the latent c
Externí odkaz:
http://arxiv.org/abs/2210.03528