Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Xu, Yaosheng"'
Autor:
Dai, J. G., Xu, Yaosheng
The weighted-workload-task-allocation (WWTA) load-balancing policy is known to be throughput optimal for parallel server systems with heterogeneous servers. This work concerns the heavy traffic approximation of steady-state performance for parallel s
Externí odkaz:
http://arxiv.org/abs/2406.04203
This paper examines a continuous-time routing system with general interarrival and service time distributions, operating under the join-the-shortest-queue and power-of-two-choices policies. Under a weaker set of assumptions than those commonly found
Externí odkaz:
http://arxiv.org/abs/2405.10876
We prove that under a multi-scale heavy traffic condition, the stationary distribution of the scaled queue length vector process in any generalized Jackson network has a product-form limit. Each component in the product form has an exponential distri
Externí odkaz:
http://arxiv.org/abs/2304.01499
Autor:
Liang, Litian, Xu, Yaosheng, McAleer, Stephen, Hu, Dailin, Ihler, Alexander, Abbeel, Pieter, Fox, Roy
Publikováno v:
ICML 2022
In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to reduce overestimation, including several recent ensembl
Externí odkaz:
http://arxiv.org/abs/2209.07670
Publikováno v:
neurips 2021 deep rl workshop
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $\
Externí odkaz:
http://arxiv.org/abs/2112.02852
Autor:
Liang, Litian, Xu, Yaosheng, McAleer, Stephen, Hu, Dailin, Ihler, Alexander, Abbeel, Pieter, Fox, Roy
Temporal-Difference (TD) learning methods, such as Q-Learning, have proven effective at learning a policy to perform control tasks. One issue with methods like Q-Learning is that the value update introduces bias when predicting the TD target of a unf
Externí odkaz:
http://arxiv.org/abs/2110.14818
This paper studies the estimation of the coefficient matrix $\Ttheta$ in multivariate regression with hidden variables, $Y = (\Ttheta)^TX + (B^*)^TZ + E$, where $Y$ is a $m$-dimensional response vector, $X$ is a $p$-dimensional vector of observable f
Externí odkaz:
http://arxiv.org/abs/2003.13844
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Bellalah, Mondher1,2 (AUTHOR) mondher.bellalah@gmail.com, Xu, Yaosheng3 (AUTHOR) yx433@cornell.edu, Zhang, Detao4 (AUTHOR) zhangdetao@sdu.edu.cn
Publikováno v:
Annals of Operations Research. Oct2019, Vol. 281 Issue 1/2, p397-422. 26p.