Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Sarrof, Yash"'
Autor:
Huang, Xinting, Yang, Andy, Bhattamishra, Satwik, Sarrof, Yash, Krebs, Andreas, Zhou, Hattie, Nakkiran, Preetum, Hahn, Michael
A major challenge for transformers is generalizing to sequences longer than those observed during training. While previous works have empirically shown that transformers can either succeed or fail at length generalization depending on the task, theor
Externí odkaz:
http://arxiv.org/abs/2410.02140
Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little understanding of the in-principle abilities of such models, whic
Externí odkaz:
http://arxiv.org/abs/2405.17394