Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Jaszczur, Sebastian"'
Autor:
Krajewski, Jakub, Ludziejewski, Jan, Adamczewski, Kamil, Pióro, Maciej, Krutul, Michał, Antoniak, Szymon, Ciebiera, Kamil, Król, Krystian, Odrzygóźdź, Tomasz, Sankowski, Piotr, Cygan, Marek, Jaszczur, Sebastian
Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introdu
Externí odkaz:
http://arxiv.org/abs/2402.07871
Autor:
Pióro, Maciej, Ciebiera, Kamil, Król, Krystian, Ludziejewski, Jan, Krutul, Michał, Krajewski, Jakub, Antoniak, Szymon, Miłoś, Piotr, Cygan, Marek, Jaszczur, Sebastian
State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based Large Language Models, in
Externí odkaz:
http://arxiv.org/abs/2401.04081
Autor:
Staniszewski, Konrad, Tworkowski, Szymon, Jaszczur, Sebastian, Zhao, Yu, Michalewski, Henryk, Kuciński, Łukasz, Miłoś, Piotr
Recent advancements in long-context large language models have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. This study investigates structuring training data to enhance semantic i
Externí odkaz:
http://arxiv.org/abs/2312.17296
Autor:
Antoniak, Szymon, Krutul, Michał, Pióro, Maciej, Krajewski, Jakub, Ludziejewski, Jan, Ciebiera, Kamil, Król, Krystian, Odrzygóźdź, Tomasz, Cygan, Marek, Jaszczur, Sebastian
Mixture of Experts (MoE) models based on Transformer architecture are pushing the boundaries of language and vision tasks. The allure of these models lies in their ability to substantially increase the parameter count without a corresponding increase
Externí odkaz:
http://arxiv.org/abs/2310.15961
Autor:
Jaszczur, Sebastian, Chowdhery, Aakanksha, Mohiuddin, Afroz, Kaiser, Łukasz, Gajewski, Wojciech, Michalewski, Henryk, Kanerva, Jonni
Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse var
Externí odkaz:
http://arxiv.org/abs/2111.12763
We use neural graph networks with a message-passing architecture and an attention mechanism to enhance the branching heuristic in two SAT-solving algorithms. We report improvements of learned neural heuristics compared with two standard human-designe
Externí odkaz:
http://arxiv.org/abs/2005.13406