Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Nguyen, Bint T."'
Autor:
Do, Giang, Le, Khiem, Pham, Quang, Nguyen, TrungTin, Doan, Thanh-Nam, Nguyen, Bint T., Liu, Chenghao, Ramasamy, Savitha, Li, Xiaoli, Hoi, Steven
By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing
Externí odkaz:
http://arxiv.org/abs/2312.07035