Výsledky vyhledávání

Report

Mixture of Parrots: Experts improve memorization more than reasoning

Autor: Jelassi, Samy, Mohri, Clara, Brandfonbrener, David, Gu, Alex, Vyas, Nikhil, Anand, Nikhil, Alvarez-Melis, David, Li, Yuanzhi, Kakade, Sham M., Malach, Eran

The Mixture-of-Experts (MoE) architecture enables a significant increase in the total number of model parameters with minimal computational overhead. However, it is not clear what performance tradeoffs, if any, exist between MoEs and standard dense t

Externí odkaz: http://arxiv.org/abs/2410.19034

Zobrazit plný text záznamu

Report

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

Autor: Zhang, Natalia, Wang, Xinqi, Cui, Qiwen, Zhou, Runlong, Kakade, Sham M., Du, Simon S.

We initiate the study of Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in g

Externí odkaz: http://arxiv.org/abs/2409.00717

Zobrazit plný text záznamu

Report

Eliminating Position Bias of Language Models: A Mechanistic Approach

Autor: Wang, Ziqi, Zhang, Hanlin, Li, Xiner, Huang, Kuan-Hao, Han, Chi, Ji, Shuiwang, Kakade, Sham M., Peng, Hao, Ji, Heng

Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness

Externí odkaz: http://arxiv.org/abs/2407.01100

Zobrazit plný text záznamu

Report

Transcendence: Generative Models Can Outperform The Experts That Train Them

Autor: Zhang, Edwin, Zhu, Vincent, Saphra, Naomi, Kleiman, Anat, Edelman, Benjamin L., Tambe, Milind, Kakade, Sham M., Malach, Eran

Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outp

Externí odkaz: http://arxiv.org/abs/2406.11741

Zobrazit plný text záznamu

Report

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Autor: Lin, Licong, Wu, Jingfeng, Kakade, Sham M., Bartlett, Peter L., Lee, Jason D.

Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approxi

Externí odkaz: http://arxiv.org/abs/2406.08466

Zobrazit plný text záznamu

Report

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Autor: Shen, Ethan, Fan, Alan, Pratt, Sarah M., Park, Jae Sung, Wallingford, Matthew, Kakade, Sham M., Holtzman, Ari, Krishna, Ranjay, Farhadi, Ali, Kusupati, Aditya

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autor

Externí odkaz: http://arxiv.org/abs/2405.18400

Zobrazit plný text záznamu

Report

Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent

Autor: Kou, Yiwen, Chen, Zixiang, Gu, Quanquan, Kakade, Sham M.

The $k$-parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the $k$-parity problem with stochastic gradient descent (SGD

Externí odkaz: http://arxiv.org/abs/2404.12376

Zobrazit plný text záznamu

Report

Repeat After Me: Transformers are Better than State Space Models at Copying

Autor: Jelassi, Samy, Brandfonbrener, David, Kakade, Sham M., Malach, Eran

Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we refer to as "generalized state space models" (GSSMs). I

Externí odkaz: http://arxiv.org/abs/2402.01032

Zobrazit plný text záznamu

Report

Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games

Autor: Foster, Dylan J., Golowich, Noah, Kakade, Sham M.

We consider the problem of decentralized multi-agent reinforcement learning in Markov games. A fundamental question is whether there exist algorithms that, when adopted by all agents and run independently in a decentralized fashion, lead to no-regret

Externí odkaz: http://arxiv.org/abs/2303.12287

Zobrazit plný text záznamu

Report

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

Autor: Wu, Jingfeng, Zou, Difan, Chen, Zixiang, Braverman, Vladimir, Gu, Quanquan, Kakade, Sham M.

This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called G

Externí odkaz: http://arxiv.org/abs/2303.02255

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání