Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Shi, Jingze"'
We prove the availability of inner product form position encoding in the state space dual algorithm and study the effectiveness of different position embeddings in the hybrid quadratic causal self-attention and state space dual algorithms. We propose
Externí odkaz:
http://arxiv.org/abs/2407.16958
Recent research has shown that combining Mamba with Transformer architecture, which has selective state space and quadratic self-attention mechanism, outperforms using Mamba or Transformer architecture alone in language modeling tasks. The quadratic
Externí odkaz:
http://arxiv.org/abs/2406.16495