Zobrazeno 1 - 3
of 3
pro vyhledávání: '"Krutul, Michał"'
Autor:
Krajewski, Jakub, Ludziejewski, Jan, Adamczewski, Kamil, Pióro, Maciej, Krutul, Michał, Antoniak, Szymon, Ciebiera, Kamil, Król, Krystian, Odrzygóźdź, Tomasz, Sankowski, Piotr, Cygan, Marek, Jaszczur, Sebastian
Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introdu
Externí odkaz:
http://arxiv.org/abs/2402.07871
Autor:
Pióro, Maciej, Ciebiera, Kamil, Król, Krystian, Ludziejewski, Jan, Krutul, Michał, Krajewski, Jakub, Antoniak, Szymon, Miłoś, Piotr, Cygan, Marek, Jaszczur, Sebastian
State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based Large Language Models, in
Externí odkaz:
http://arxiv.org/abs/2401.04081
Autor:
Antoniak, Szymon, Krutul, Michał, Pióro, Maciej, Krajewski, Jakub, Ludziejewski, Jan, Ciebiera, Kamil, Król, Krystian, Odrzygóźdź, Tomasz, Cygan, Marek, Jaszczur, Sebastian
Mixture of Experts (MoE) models based on Transformer architecture are pushing the boundaries of language and vision tasks. The allure of these models lies in their ability to substantially increase the parameter count without a corresponding increase
Externí odkaz:
http://arxiv.org/abs/2310.15961