Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Bozic, Vukasin"'
This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We
Externí odkaz:
http://arxiv.org/abs/2311.10642