Zobrazeno 1 - 10
of 9 730
pro vyhledávání: '"attention layers"'
Autor:
Katz, Shahar, Wolf, Lior
The success of Transformer-based Language Models (LMs) stems from their attention mechanism. While this mechanism has been extensively studied in explainability research, particularly through the attention values obtained during the forward pass of L
Externí odkaz:
http://arxiv.org/abs/2412.17019
The effect of regularizers such as weight decay when training deep neural networks is not well understood. We study the influence of weight decay as well as $L2$-regularization when training neural network models in which parameter matrices interact
Externí odkaz:
http://arxiv.org/abs/2410.23819
Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-loc
Externí odkaz:
http://arxiv.org/abs/2410.01537
Attention mechanisms are becoming increasingly popular, being used in neural network models in multiple domains such as natural language processing (NLP) and vision applications, especially at the edge. However, attention layers are difficult to map
Externí odkaz:
http://arxiv.org/abs/2405.04206
Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or capt
Externí odkaz:
http://arxiv.org/abs/2405.03958
Attention-based mechanisms are widely used in machine learning, most prominently in transformers. However, hyperparameters such as the rank of the attention matrices and the number of heads are scaled nearly the same way in all realizations of this a
Externí odkaz:
http://arxiv.org/abs/2407.16153
Autor:
Bombari, Simone, Mondelli, Marco
Understanding the reasons behind the exceptional success of transformers requires a better analysis of why attention layers are suitable for NLP tasks. In particular, such tasks require predictive models to capture contextual meaning which often depe
Externí odkaz:
http://arxiv.org/abs/2402.02969
Earth structural heterogeneities have a remarkable role in the petroleum economy for both exploration and production projects. Automatic detection of detailed structural heterogeneities is challenging when considering modern machine learning techniqu
Externí odkaz:
http://arxiv.org/abs/2404.10170
This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We
Externí odkaz:
http://arxiv.org/abs/2311.10642
Autor:
Ilias, Loukas, Askounis, Dimitris
Publikováno v:
Knowledge-Based Systems (Volume: 277, October 2023)
Alzheimer's disease (AD) constitutes a complex neurocognitive disease and is the main cause of dementia. Although many studies have been proposed targeting at diagnosing dementia through spontaneous speech, there are still limitations. Existing state
Externí odkaz:
http://arxiv.org/abs/2305.16406