Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Piękos, Piotr"'
Despite many recent works on Mixture of Experts (MoEs) for resource-efficient Transformer language models, existing methods mostly focus on MoEs for feedforward layers. Previous attempts at extending MoE to the self-attention layer fail to match the
Externí odkaz:
http://arxiv.org/abs/2312.07987
Autor:
Zhuge, Mingchen, Liu, Haozhe, Faccio, Francesco, Ashley, Dylan R., Csordás, Róbert, Gopalakrishnan, Anand, Hamdi, Abdullah, Hammoud, Hasan Abed Al Kader, Herrmann, Vincent, Irie, Kazuki, Kirsch, Louis, Li, Bing, Li, Guohao, Liu, Shuming, Mai, Jinjie, Piękos, Piotr, Ramesh, Aditya, Schlag, Imanol, Shi, Weimin, Stanić, Aleksandar, Wang, Wenyi, Wang, Yuhui, Xu, Mengmeng, Fan, Deng-Ping, Ghanem, Bernard, Schmidhuber, Jürgen
Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of
Externí odkaz:
http://arxiv.org/abs/2305.17066
Autor:
Zawalski, Michał, Tyrolski, Michał, Czechowski, Konrad, Odrzygóźdź, Tomasz, Stachura, Damian, Piękos, Piotr, Wu, Yuhuai, Kuciński, Łukasz, Miłoś, Piotr
Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. Taking advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the plann
Externí odkaz:
http://arxiv.org/abs/2206.00702
Imagine you are in a supermarket. You have two bananas in your basket and want to buy four apples. How many fruits do you have in total? This seemingly straightforward question can be challenging for data-driven language models, even if trained at sc
Externí odkaz:
http://arxiv.org/abs/2106.03921