Zobrazeno 1 - 10
of 497
pro vyhledávání: '"Gu, Quanquan"'
Mastering multiple tasks through exploration and learning in an environment poses a significant challenge in reinforcement learning (RL). Unsupervised RL has been introduced to address this challenge by training policies with intrinsic rewards rather
Externí odkaz:
http://arxiv.org/abs/2406.16255
Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that dir
Externí odkaz:
http://arxiv.org/abs/2405.00675
The $k$-parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the $k$-parity problem with stochastic gradient descent (SGD
Externí odkaz:
http://arxiv.org/abs/2404.12376
Autor:
Han, Jun, Chen, Zixiang, Li, Yongqian, Kou, Yiwen, Halperin, Eran, Tillman, Robert E., Gu, Quanquan
Electronic health records (EHRs) are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research. Despite wide usability,
Externí odkaz:
http://arxiv.org/abs/2404.12314
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM). However, the effectiveness of this approach can be influenced by adversaries, who may intentionally provide misleading preference
Externí odkaz:
http://arxiv.org/abs/2404.10776
We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for misspecified li
Externí odkaz:
http://arxiv.org/abs/2404.10745
Contextual dueling bandits, where a learner compares two options based on context and receives feedback indicating which was preferred, extends classic dueling bandits by incorporating contextual information for decision-making and preference learnin
Externí odkaz:
http://arxiv.org/abs/2404.06013
Antibody design, a crucial task with significant implications across various disciplines such as therapeutics and biology, presents considerable challenges due to its intricate nature. In this paper, we tackle antigen-specific antibody sequence-struc
Externí odkaz:
http://arxiv.org/abs/2403.16576
The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling an
Externí odkaz:
http://arxiv.org/abs/2403.14088
Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals
Externí odkaz:
http://arxiv.org/abs/2403.13829