Výsledky vyhledávání - "Oymak, Samet"

Report

Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond

Autor: Li, Yingcong, Rawat, Ankit Singh, Oymak, Samet

Recent research has shown that Transformers with linear attention are capable of in-context learning (ICL) by implementing a linear estimator through gradient descent steps. However, the existing results on the optimization landscape apply under styl

Externí odkaz: http://arxiv.org/abs/2407.10005

Zobrazit plný text záznamu

Report

On the Power of Convolution Augmented Transformer

Autor: Li, Mingchen, Zhang, Xuechen, Huang, Yixiao, Oymak, Samet

The transformer architecture has catalyzed revolutionary advances in language modeling. However, recent architectural recipes, such as state-space models, have bridged the performance gap. Motivated by this, we examine the benefits of Convolution-Aug

Externí odkaz: http://arxiv.org/abs/2407.05591

Zobrazit plný text záznamu

Report

TREACLE: Thrifty Reasoning via Context-Aware LLM and Prompt Selection

Autor: Zhang, Xuechen, Huang, Zijian, Taga, Ege Onur, Joe-Wong, Carlee, Oymak, Samet, Chen, Jiasi

Recent successes in natural language processing have led to the proliferation of large language models (LLMs) by multiple providers. Each LLM offering has different inference accuracy, monetary cost, and latency, and their accuracy further depends on

Externí odkaz: http://arxiv.org/abs/2404.13082

Zobrazit plný text záznamu

Report

Mechanics of Next Token Prediction with Self-Attention

Autor: Li, Yingcong, Huang, Yixiao, Ildiz, M. Emrullah, Rawat, Ankit Singh, Oymak, Samet

Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success

Externí odkaz: http://arxiv.org/abs/2403.08081

Zobrazit plný text záznamu

Report

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

Autor: Ildiz, M. Emrullah, Huang, Yixiao, Li, Yingcong, Rawat, Ankit Singh, Oymak, Samet

Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation. In this work, we study learning a 1-layer self-attention model from a set of prompts and associated output data

Externí odkaz: http://arxiv.org/abs/2402.13512

Zobrazit plný text záznamu

Report

FLASH: Federated Learning Across Simultaneous Heterogeneities

Autor: Chang, Xiangyu, Ahmed, Sk Miraj, Krishnamurthy, Srikanth V., Guler, Basak, Swami, Ananthram, Oymak, Samet, Roy-Chowdhury, Amit K.

The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in

Externí odkaz: http://arxiv.org/abs/2402.08769

Zobrazit plný text záznamu

Report

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Autor: Park, Jongho, Park, Jaeseung, Xiong, Zheyang, Lee, Nayoung, Cho, Jaewoong, Oymak, Samet, Lee, Kangwook, Papailiopoulos, Dimitris

State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of m

Externí odkaz: http://arxiv.org/abs/2402.04248

Zobrazit plný text záznamu

Report

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

Autor: Zhang, Xuechen, Li, Mingchen, Chen, Jiasi, Thrampoulidis, Christos, Oymak, Samet

Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, th

Externí odkaz: http://arxiv.org/abs/2401.14343

Zobrazit plný text záznamu

Report

Plug-and-Play Transformer Modules for Test-Time Adaptation

Autor: Chang, Xiangyu, Ahmed, Sk Miraj, Krishnamurthy, Srikanth V., Guler, Basak, Swami, Ananthram, Oymak, Samet, Roy-Chowdhury, Amit K.

Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered dur

Externí odkaz: http://arxiv.org/abs/2401.04130

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání