Výsledky vyhledávání

Report

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

Autor: Zhang, Ruiqi, Lin, Licong, Bai, Yu, Mei, Song

Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tas

Externí odkaz: http://arxiv.org/abs/2404.05868

Zobrazit plný text záznamu

Report

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective

Autor: Zhao, Lei, Wang, Mengdi, Bai, Yu

Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an \emph{expert policy} -- plays a critical role in developing intelligent systems. While widely used in applications, theoretical understandings

Externí odkaz: http://arxiv.org/abs/2312.00054

Zobrazit plný text záznamu

Report

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

Autor: Lin, Licong, Bai, Yu, Mei, Song

Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from uns

Externí odkaz: http://arxiv.org/abs/2310.08566

Zobrazit plný text záznamu

Report

What can a Single Attention Layer Learn? A Study Through the Random Features Lens

Autor: Fu, Hengyu, Guo, Tianyu, Bai, Yu, Mei, Song

Attention layers -- which map a sequence of inputs to a sequence of outputs -- are core building blocks of the Transformer architecture which has achieved significant breakthroughs in modern artificial intelligence. This paper presents a rigorous the

Externí odkaz: http://arxiv.org/abs/2307.11353

Zobrazit plný text záznamu

Report

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

Autor: Guo, Jiacheng, Chen, Minshuo, Wang, Huan, Xiong, Caiming, Wang, Mengdi, Bai, Yu

This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world setting

Externí odkaz: http://arxiv.org/abs/2307.02884

Zobrazit plný text záznamu

Report

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

Autor: Bai, Yu, Chen, Fan, Wang, Huan, Xiong, Caiming, Mei, Song

Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the

Externí odkaz: http://arxiv.org/abs/2306.04637

Zobrazit plný text záznamu

Report

Improved Online Conformal Prediction via Strongly Adaptive Online Learning

Autor: Bhatnagar, Aadyot, Wang, Huan, Xiong, Caiming, Bai, Yu

We study the problem of uncertainty quantification via prediction sets, in an online setting where the data distribution may vary arbitrarily over time. Recent work develops online conformal prediction techniques that leverage regret minimization alg

Externí odkaz: http://arxiv.org/abs/2302.07869

Zobrazit plný text záznamu

Report

Breaking the Curse of Multiagency: Provably Efficient Decentralized Multi-Agent RL with Function Approximation

Autor: Wang, Yuanhao, Liu, Qinghua, Bai, Yu, Jin, Chi

A unique challenge in Multi-Agent Reinforcement Learning (MARL) is the curse of multiagency, where the description length of the game as well as the complexity of many existing learning algorithms scale exponentially with the number of agents. While

Externí odkaz: http://arxiv.org/abs/2302.06606

Zobrazit plný text záznamu

Report

Offline Learning in Markov Games with General Function Approximation

Autor: Zhang, Yuheng, Bai, Yu, Jiang, Nan

We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game. Ex

Externí odkaz: http://arxiv.org/abs/2302.02571

Zobrazit plný text záznamu

Report

Lower Bounds for Learning in Revealing POMDPs

Autor: Chen, Fan, Wang, Huan, Xiong, Caiming, Mei, Song, Bai, Yu

This paper studies the fundamental limits of reinforcement learning (RL) in the challenging \emph{partially observable} setting. While it is well-established that learning in Partially Observable Markov Decision Processes (POMDPs) requires exponentia

Externí odkaz: http://arxiv.org/abs/2302.01333

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání