Výsledky vyhledávání - "Jaszczur, Sebastian"

Report

Scaling Laws for Fine-Grained Mixture of Experts

Autor: Krajewski, Jakub, Ludziejewski, Jan, Adamczewski, Kamil, Pióro, Maciej, Krutul, Michał, Antoniak, Szymon, Ciebiera, Kamil, Król, Krystian, Odrzygóźdź, Tomasz, Sankowski, Piotr, Cygan, Marek, Jaszczur, Sebastian

Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introdu

Externí odkaz: http://arxiv.org/abs/2402.07871

Zobrazit plný text záznamu

Report

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Autor: Pióro, Maciej, Ciebiera, Kamil, Król, Krystian, Ludziejewski, Jan, Krutul, Michał, Krajewski, Jakub, Antoniak, Szymon, Miłoś, Piotr, Cygan, Marek, Jaszczur, Sebastian

State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based Large Language Models, in

Externí odkaz: http://arxiv.org/abs/2401.04081

Zobrazit plný text záznamu

Report

Structured Packing in LLM Training Improves Long Context Utilization

Autor: Staniszewski, Konrad, Tworkowski, Szymon, Jaszczur, Sebastian, Zhao, Yu, Michalewski, Henryk, Kuciński, Łukasz, Miłoś, Piotr

Recent advancements in long-context large language models have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. This study investigates structuring training data to enhance semantic i

Externí odkaz: http://arxiv.org/abs/2312.17296

Zobrazit plný text záznamu

Report

Mixture of Tokens: Continuous MoE through Cross-Example Aggregation

Autor: Antoniak, Szymon, Krutul, Michał, Pióro, Maciej, Krajewski, Jakub, Ludziejewski, Jan, Ciebiera, Kamil, Król, Krystian, Odrzygóźdź, Tomasz, Cygan, Marek, Jaszczur, Sebastian

Mixture of Experts (MoE) models based on Transformer architecture are pushing the boundaries of language and vision tasks. The allure of these models lies in their ability to substantially increase the parameter count without a corresponding increase

Externí odkaz: http://arxiv.org/abs/2310.15961

Zobrazit plný text záznamu

Report

Sparse is Enough in Scaling Transformers

Autor: Jaszczur, Sebastian, Chowdhery, Aakanksha, Mohiuddin, Afroz, Kaiser, Łukasz, Gajewski, Wojciech, Michalewski, Henryk, Kanerva, Jonni

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse var

Externí odkaz: http://arxiv.org/abs/2111.12763

Zobrazit plný text záznamu

Report

Neural heuristics for SAT solving

Autor: Jaszczur, Sebastian, Łuszczyk, Michał, Michalewski, Henryk

We use neural graph networks with a message-passing architecture and an attention mechanism to enhance the branching heuristic in two SAT-solving algorithms. We report improvements of learned neural heuristics compared with two standard human-designe

Externí odkaz: http://arxiv.org/abs/2005.13406

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání