Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Dar, Guy"'
In-context learning (ICL) has shown impressive results in few-shot learning tasks, yet its underlying mechanism is still not fully understood. A recent line of work suggests that ICL performs gradient descent (GD)-based optimization implicitly. While
Externí odkaz:
http://arxiv.org/abs/2311.07772
Understanding Transformer-based models has attracted significant attention, as they lie at the heart of recent technological advances across machine learning. While most interpretability methods rely on running models over inputs, recent work has sho
Externí odkaz:
http://arxiv.org/abs/2209.02535
Autor:
Geva, Mor, Caciularu, Avi, Dar, Guy, Roit, Paul, Sadde, Shoval, Shlain, Micah, Tamir, Bar, Goldberg, Yoav
The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions. However, current interpretation methods mostly focus on probing models from outside, executing behav
Externí odkaz:
http://arxiv.org/abs/2204.12130
Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not
Externí odkaz:
http://arxiv.org/abs/2106.06899