Zobrazeno 1 - 10
of 1 461
pro vyhledávání: '"Hu, Yifan"'
Policy gradient methods are widely used in reinforcement learning. Yet, the nonconvexity of policy optimization imposes significant challenges in understanding the global convergence of policy gradient methods. For a class of finite-horizon Markov De
Externí odkaz:
http://arxiv.org/abs/2409.17138
Stock price prediction is a challenging problem in the field of finance and receives widespread attention. In recent years, with the rapid development of technologies such as deep learning and graph neural networks, more research methods have begun t
Externí odkaz:
http://arxiv.org/abs/2409.08282
We consider stochastic optimization when one only has access to biased stochastic oracles of the objective and the gradient, and obtaining stochastic gradients with low biases comes at high costs. This setting captures various optimization paradigms,
Externí odkaz:
http://arxiv.org/abs/2408.11084
Conversational Speech Synthesis (CSS) aims to express a target utterance with the proper speaking style in a user-agent conversation setting. Existing CSS methods employ effective multi-modal context modeling techniques to achieve empathy understandi
Externí odkaz:
http://arxiv.org/abs/2407.21491
Long-term stability stands as a crucial requirement in data-driven medium-range global weather forecasting. Spectral bias is recognized as the primary contributor to instabilities, as data-driven methods difficult to learn small-scale dynamics. In th
Externí odkaz:
http://arxiv.org/abs/2407.01598
The location of knowledge within Generative Pre-trained Transformer (GPT)-like models has seen extensive recent investigation. However, much of the work is focused towards determining locations of individual facts, with the end goal being the editing
Externí odkaz:
http://arxiv.org/abs/2406.15940
Transformer-based and MLP-based methods have emerged as leading approaches in time series forecasting (TSF). While Transformer-based methods excel in capturing long-range dependencies, they suffer from high computational complexities and tend to over
Externí odkaz:
http://arxiv.org/abs/2406.03751
In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (
Externí odkaz:
http://arxiv.org/abs/2406.01575
Autor:
Ramesh, Shyam Sundhar, Hu, Yifan, Chaimalas, Iason, Mehta, Viraj, Sessa, Pier Giuseppe, Ammar, Haitham Bou, Bogunovic, Ilija
Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographic
Externí odkaz:
http://arxiv.org/abs/2405.20304
We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix in
Externí odkaz:
http://arxiv.org/abs/2405.19463