Zobrazeno 1 - 10
of 19 822
pro vyhledávání: '"Kakade AS"'
Autor:
Van Roy, Benjamin, Dong, Shi
Du, Kakade, Wang, and Yang recently established intriguing lower bounds on sample complexity, which suggest that reinforcement learning with a misspecified representation is intractable. Another line of work, which centers around a statistic called t
Externí odkaz:
http://arxiv.org/abs/1911.07910
Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the
Externí odkaz:
http://arxiv.org/abs/2011.01075
Autor:
Jelassi, Samy, Mohri, Clara, Brandfonbrener, David, Gu, Alex, Vyas, Nikhil, Anand, Nikhil, Alvarez-Melis, David, Li, Yuanzhi, Kakade, Sham M., Malach, Eran
The Mixture-of-Experts (MoE) architecture enables a significant increase in the total number of model parameters with minimal computational overhead. However, it is not clear what performance tradeoffs, if any, exist between MoEs and standard dense t
Externí odkaz:
http://arxiv.org/abs/2410.19034
Autor:
Prabhakar, Akshara, Li, Yuanzhi, Narasimhan, Karthik, Kakade, Sham, Malach, Eran, Jelassi, Samy
Low-Rank Adaptation (LoRA) is a popular technique for parameter-efficient fine-tuning of Large Language Models (LLMs). We study how different LoRA modules can be merged to achieve skill composition -- testing the performance of the merged model on a
Externí odkaz:
http://arxiv.org/abs/2410.13025
While transformers have been at the core of most recent advancements in sequence generative models, their computational cost remains quadratic in sequence length. Several subquadratic architectures have been proposed to address this computational iss
Externí odkaz:
http://arxiv.org/abs/2410.12982
This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a facility. Specifically, this paper is motivated by
Externí odkaz:
http://arxiv.org/abs/2410.02817
Autor:
Vyas, Nikhil, Morwani, Depen, Zhao, Rosie, Shapira, Itai, Brandfonbrener, David, Janson, Lucas, Kakade, Sham
There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared
Externí odkaz:
http://arxiv.org/abs/2409.11321
We initiate the study of Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in g
Externí odkaz:
http://arxiv.org/abs/2409.00717
Training language models becomes increasingly expensive with scale, prompting numerous attempts to improve optimization efficiency. Despite these efforts, the Adam optimizer remains the most widely used, due to a prevailing view that it is the most e
Externí odkaz:
http://arxiv.org/abs/2407.07972
Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models. While prior work has proposed some architecture or data format changes to achieve le
Externí odkaz:
http://arxiv.org/abs/2407.03310