Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Bick, Aviv"'
Transformer architectures have become a dominant paradigm for domains like language modeling but suffer in many inference settings due to their quadratic-time self-attention. Recently proposed subquadratic architectures, such as Mamba, have shown pro
Externí odkaz:
http://arxiv.org/abs/2408.10189