Výsledky vyhledávání

Report

In-context Learning and Gradient Descent Revisited

Autor: Deutch, Gilad, Magar, Nadav, Natan, Tomer Bar, Dar, Guy

In-context learning (ICL) has shown impressive results in few-shot learning tasks, yet its underlying mechanism is still not fully understood. A recent line of work suggests that ICL performs gradient descent (GD)-based optimization implicitly. While

Externí odkaz: http://arxiv.org/abs/2311.07772

Zobrazit plný text záznamu

Report

Analyzing Transformers in Embedding Space

Autor: Dar, Guy, Geva, Mor, Gupta, Ankit, Berant, Jonathan

Understanding Transformer-based models has attracted significant attention, as they lie at the heart of recent technological advances across machine learning. While most interpretability methods rely on running models over inputs, recent work has sho

Externí odkaz: http://arxiv.org/abs/2209.02535

Zobrazit plný text záznamu

Report

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

Autor: Geva, Mor, Caciularu, Avi, Dar, Guy, Roit, Paul, Sadde, Shoval, Shlain, Micah, Tamir, Bar, Goldberg, Yoav

The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions. However, current interpretation methods mostly focus on probing models from outside, executing behav

Externí odkaz: http://arxiv.org/abs/2204.12130

Zobrazit plný text záznamu

Report

Memory-efficient Transformers via Top-$k$ Attention

Autor: Gupta, Ankit, Dar, Guy, Goodman, Shaya, Ciprut, David, Berant, Jonathan

Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not

Externí odkaz: http://arxiv.org/abs/2106.06899

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání