Výsledky vyhledávání

Report

Comments on the Du-Kakade-Wang-Yang Lower Bounds

Du, Kakade, Wang, and Yang recently established intriguing lower bounds on sample complexity, which suggest that reinforcement learning with a misspecified representation is intractable. Another line of work, which centers around a statistic called t

Externí odkaz: http://arxiv.org/abs/1911.07910

Zobrazit plný text záznamu

Report

A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

Autor: Amortila, Philip, Jiang, Nan, Xie, Tengyang

Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the

Externí odkaz: http://arxiv.org/abs/2011.01075

Zobrazit plný text záznamu

Report

Mixture of Parrots: Experts improve memorization more than reasoning

Autor: Jelassi, Samy, Mohri, Clara, Brandfonbrener, David, Gu, Alex, Vyas, Nikhil, Anand, Nikhil, Alvarez-Melis, David, Li, Yuanzhi, Kakade, Sham M., Malach, Eran

The Mixture-of-Experts (MoE) architecture enables a significant increase in the total number of model parameters with minimal computational overhead. However, it is not clear what performance tradeoffs, if any, exist between MoEs and standard dense t

Externí odkaz: http://arxiv.org/abs/2410.19034

Zobrazit plný text záznamu

Report

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

Autor: Prabhakar, Akshara, Li, Yuanzhi, Narasimhan, Karthik, Kakade, Sham, Malach, Eran, Jelassi, Samy

Low-Rank Adaptation (LoRA) is a popular technique for parameter-efficient fine-tuning of Large Language Models (LLMs). We study how different LoRA modules can be merged to achieve skill composition -- testing the performance of the merged model on a

Externí odkaz: http://arxiv.org/abs/2410.13025

Zobrazit plný text záznamu

Report

Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond

Autor: Oncescu, Costin-Andrei, Purandare, Sanket, Idreos, Stratos, Kakade, Sham

While transformers have been at the core of most recent advancements in sequence generative models, their computational cost remains quadratic in sequence length. Several subquadratic architectures have been proposed to address this computational iss

Externí odkaz: http://arxiv.org/abs/2410.12982

Zobrazit plný text záznamu

Report

Neural Coordination and Capacity Control for Inventory Management

Autor: Eisenach, Carson, Ghai, Udaya, Madeka, Dhruv, Torkkola, Kari, Foster, Dean, Kakade, Sham

This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a facility. Specifically, this paper is motivated by

Externí odkaz: http://arxiv.org/abs/2410.02817

Zobrazit plný text záznamu

Report

SOAP: Improving and Stabilizing Shampoo using Adam

Autor: Vyas, Nikhil, Morwani, Depen, Zhao, Rosie, Shapira, Itai, Brandfonbrener, David, Janson, Lucas, Kakade, Sham

There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared

Externí odkaz: http://arxiv.org/abs/2409.11321

Zobrazit plný text záznamu

Report

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

Autor: Zhang, Natalia, Wang, Xinqi, Cui, Qiwen, Zhou, Runlong, Kakade, Sham M., Du, Simon S.

We initiate the study of Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in g

Externí odkaz: http://arxiv.org/abs/2409.00717

Zobrazit plný text záznamu

Report

Deconstructing What Makes a Good Optimizer for Language Models

Autor: Zhao, Rosie, Morwani, Depen, Brandfonbrener, David, Vyas, Nikhil, Kakade, Sham

Training language models becomes increasingly expensive with scale, prompting numerous attempts to improve optimization efficiency. Despite these efforts, the Adam optimizer remains the most widely used, due to a prevailing view that it is the most e

Externí odkaz: http://arxiv.org/abs/2407.07972

Zobrazit plný text záznamu

Report

Universal Length Generalization with Turing Programs

Autor: Hou, Kaiying, Brandfonbrener, David, Kakade, Sham, Jelassi, Samy, Malach, Eran

Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models. While prior work has proposed some architecture or data format changes to achieve le

Externí odkaz: http://arxiv.org/abs/2407.03310

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání