Zobrazeno 1 - 10
of 194
pro vyhledávání: '"Tan, Xiaoyang"'
Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies is constr
Externí odkaz:
http://arxiv.org/abs/2406.03894
Autor:
Wang, Yuhui, Strupl, Miroslav, Faccio, Francesco, Wu, Qingyuan, Liu, Haozhe, Grudzień, Michał, Tan, Xiaoyang, Schmidhuber, Jürgen
Learning from multi-step off-policy data collected by a set of policies is a core problem of reinforcement learning (RL). Approaches based on importance sampling (IS) often suffer from large variances due to products of IS ratios. Typical IS-free met
Externí odkaz:
http://arxiv.org/abs/2405.18289
Retrieval-augmented generation (RAG) has rapidly advanced the language model field, particularly in question-answering (QA) systems. By integrating external documents during the response generation phase, RAG significantly enhances the accuracy and r
Externí odkaz:
http://arxiv.org/abs/2402.01767
Most existing salient object detection methods mostly use U-Net or feature pyramid structure, which simply aggregates feature maps of different scales, ignoring the uniqueness and interdependence of them and their respective contributions to the fina
Externí odkaz:
http://arxiv.org/abs/2309.08365
ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer
Problems such as equipment defects or limited viewpoints will lead the captured point clouds to be incomplete. Therefore, recovering the complete point clouds from the partial ones plays an vital role in many practical tasks, and one of the keys lies
Externí odkaz:
http://arxiv.org/abs/2302.14435
Autor:
Tan, Xiaoyang
甲第24060号
地環博第223号
新制||地環||42(附属図書館)
学位規則第4条第1項該当
Doctor of Global Environmental Studies
Kyoto University
DFAM
地環博第223号
新制||地環||42(附属図書館)
学位規則第4条第1項該当
Doctor of Global Environmental Studies
Kyoto University
DFAM
Externí odkaz:
http://hdl.handle.net/2433/275382
Offline reinforcement learning learns an effective policy on offline datasets without online interaction, and it attracts persistent research attention due to its potential of practical application. However, extrapolation error generated by distribut
Externí odkaz:
http://arxiv.org/abs/2301.01298
Advantage Learning (AL) seeks to increase the action gap between the optimal action and its competitors, so as to improve the robustness to estimation errors. However, the method becomes problematic when the optimal action induced by the approximated
Externí odkaz:
http://arxiv.org/abs/2203.11677
Advantage learning (AL) aims to improve the robustness of value-based reinforcement learning against estimation errors with action-gap-based regularization. Unfortunately, the method tends to be unstable in the case of function approximation. In this
Externí odkaz:
http://arxiv.org/abs/2203.10445
Autor:
Wen, Chao, Xu, Miao, Zhang, Zhilin, Zheng, Zhenzhe, Wang, Yuhui, Liu, Xiangyu, Rong, Yu, Xie, Dong, Tan, Xiaoyang, Yu, Chuan, Xu, Jian, Wu, Fan, Chen, Guihai, Zhu, Xiaoqiang, Zheng, Bo
In online advertising, auto-bidding has become an essential tool for advertisers to optimize their preferred ad performance metrics by simply expressing high-level campaign objectives and constraints. Previous works designed auto-bidding tools from t
Externí odkaz:
http://arxiv.org/abs/2106.06224