Zobrazeno 1 - 10
of 178
pro vyhledávání: '"Oymak, Samet"'
Autor:
Xiong, Zheyang, Cai, Ziyang, Cooper, John, Ge, Albert, Papageorgiou, Vasilis, Sifakis, Zack, Giannou, Angeliki, Lin, Ziqian, Yang, Liu, Agarwal, Saurabh, Chrysos, Grigorios G, Oymak, Samet, Lee, Kangwook, Papailiopoulos, Dimitris
Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a
Externí odkaz:
http://arxiv.org/abs/2410.05603
Recent research has shown that Transformers with linear attention are capable of in-context learning (ICL) by implementing a linear estimator through gradient descent steps. However, the existing results on the optimization landscape apply under styl
Externí odkaz:
http://arxiv.org/abs/2407.10005
The transformer architecture has catalyzed revolutionary advances in language modeling. However, recent architectural recipes, such as state-space models, have bridged the performance gap. Motivated by this, we examine the benefits of Convolution-Aug
Externí odkaz:
http://arxiv.org/abs/2407.05591
Recent successes in natural language processing have led to the proliferation of large language models (LLMs) by multiple providers. Each LLM offering has different inference accuracy, monetary cost, and latency, and their accuracy further depends on
Externí odkaz:
http://arxiv.org/abs/2404.13082
Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success
Externí odkaz:
http://arxiv.org/abs/2403.08081
Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation. In this work, we study learning a 1-layer self-attention model from a set of prompts and associated output data
Externí odkaz:
http://arxiv.org/abs/2402.13512
Autor:
Chang, Xiangyu, Ahmed, Sk Miraj, Krishnamurthy, Srikanth V., Guler, Basak, Swami, Ananthram, Oymak, Samet, Roy-Chowdhury, Amit K.
The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in
Externí odkaz:
http://arxiv.org/abs/2402.08769
Autor:
Park, Jongho, Park, Jaeseung, Xiong, Zheyang, Lee, Nayoung, Cho, Jaewoong, Oymak, Samet, Lee, Kangwook, Papailiopoulos, Dimitris
State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of m
Externí odkaz:
http://arxiv.org/abs/2402.04248
Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, th
Externí odkaz:
http://arxiv.org/abs/2401.14343
Autor:
Chang, Xiangyu, Ahmed, Sk Miraj, Krishnamurthy, Srikanth V., Guler, Basak, Swami, Ananthram, Oymak, Samet, Roy-Chowdhury, Amit K.
Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered dur
Externí odkaz:
http://arxiv.org/abs/2401.04130