Zobrazeno 1 - 10
of 68
pro vyhledávání: '"Li, Yingcong"'
Recent research has shown that Transformers with linear attention are capable of in-context learning (ICL) by implementing a linear estimator through gradient descent steps. However, the existing results on the optimization landscape apply under styl
Externí odkaz:
http://arxiv.org/abs/2407.10005
Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success
Externí odkaz:
http://arxiv.org/abs/2403.08081
Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation. In this work, we study learning a 1-layer self-attention model from a set of prompts and associated output data
Externí odkaz:
http://arxiv.org/abs/2402.13512
Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a sequence of input tokens $X$ and makes them interact through pairwise simil
Externí odkaz:
http://arxiv.org/abs/2308.16898
Attention mechanism is a central component of the transformer architecture which led to the phenomenal success of large language models. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonc
Externí odkaz:
http://arxiv.org/abs/2306.13596
Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light o
Externí odkaz:
http://arxiv.org/abs/2305.18869
Autor:
Li, Yingcong, Oymak, Samet
Constructing useful representations across a large number of tasks is a key requirement for sample-efficient intelligent systems. A traditional idea in multitask learning (MTL) is building a shared representation across tasks which can then be adapte
Externí odkaz:
http://arxiv.org/abs/2303.04338
The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons. This work takes a step in this direction by investigating contextual linear bandits where t
Externí odkaz:
http://arxiv.org/abs/2302.00814
In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a t
Externí odkaz:
http://arxiv.org/abs/2301.07067
Unsupervised clustering algorithms for vectors has been widely used in the area of machine learning. Many applications, including the biological data we studied in this paper, contain some boundary datapoints which show combination properties of two
Externí odkaz:
http://arxiv.org/abs/2205.09849