Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Zhang, Harvie"'
Autor:
Zhang, Harvie
The self-attention mechanism utilizes large implicit weight matrices, programmed through dot product-based activations with very few trainable parameters, to enable long sequence modeling. In this paper, we investigate the possibility of discarding r
Externí odkaz:
http://arxiv.org/abs/2401.17948