Výsledky vyhledávání - "Irie, Kazuki"

Report

Neural networks that overcome classic challenges through practice

Since the earliest proposals for neural network models of the mind and brain, critics have pointed out key weaknesses in these models compared to human cognitive abilities. Here we review recent work that has used metalearning to help overcome some o

Externí odkaz: http://arxiv.org/abs/2410.10596

Zobrazit plný text záznamu

Report

MoEUT: Mixture-of-Experts Universal Transformers

Autor: Csordás, Róbert, Irie, Kazuki, Schmidhuber, Jürgen, Potts, Christopher, Manning, Christopher D.

Previous work on Universal Transformers (UTs) has demonstrated the importance of parameter sharing across layers. By allowing recurrence in depth, UTs have advantages over standard Transformers in learning compositional generalizations, but layer-sha

Externí odkaz: http://arxiv.org/abs/2405.16039

Zobrazit plný text záznamu

Report

Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers

Autor: Tiberi, Lorenzo, Mignacco, Francesca, Irie, Kazuki, Sompolinsky, Haim

Despite the remarkable empirical performance of Transformers, their theoretical understanding remains elusive. Here, we consider a deep multi-head self-attention network, that is closely related to Transformers yet analytically tractable. We develop

Externí odkaz: http://arxiv.org/abs/2405.15926

Zobrazit plný text záznamu

Report

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Autor: Csordás, Róbert, Piękos, Piotr, Irie, Kazuki, Schmidhuber, Jürgen

Despite many recent works on Mixture of Experts (MoEs) for resource-efficient Transformer language models, existing methods mostly focus on MoEs for feedforward layers. Previous attempts at extending MoE to the self-attention layer fail to match the

Externí odkaz: http://arxiv.org/abs/2312.07987

Zobrazit plný text záznamu

Report

Automating Continual Learning

Autor: Irie, Kazuki, Csordás, Róbert, Schmidhuber, Jürgen

General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF) -- previously acquired skills are

Externí odkaz: http://arxiv.org/abs/2312.00276

Zobrazit plný text záznamu

Report

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

Autor: Irie, Kazuki, Csordás, Róbert, Schmidhuber, Jürgen

Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. line

Externí odkaz: http://arxiv.org/abs/2310.16076

Zobrazit plný text záznamu

Report

Approximating Two-Layer Feedforward Networks for Efficient Transformers

Autor: Csordás, Róbert, Irie, Kazuki, Schmidhuber, Jürgen

How to reduce compute and memory requirements of neural networks (NNs) without sacrificing performance? Many recent works use sparse Mixtures of Experts (MoEs) to build resource-efficient large language models (LMs). Here we introduce several novel p

Externí odkaz: http://arxiv.org/abs/2310.10837

Zobrazit plný text záznamu

Report

Exploring the Promise and Limits of Real-Time Recurrent Learning

Autor: Irie, Kazuki, Gopalakrishnan, Anand, Schmidhuber, Jürgen

Real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT). RTRL requires neither caching past activations nor truncating context, and ena

Externí odkaz: http://arxiv.org/abs/2305.19044

Zobrazit plný text záznamu

Report

Mindstorms in Natural Language-Based Societies of Mind

Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of

Externí odkaz: http://arxiv.org/abs/2305.17066

Zobrazit plný text záznamu

Report

Contrastive Training of Complex-Valued Autoencoders for Object Discovery

Autor: Stanić, Aleksandar, Gopalakrishnan, Anand, Irie, Kazuki, Schmidhuber, Jürgen

Current state-of-the-art object-centric models use slots and attention-based routing for binding. However, this class of models has several conceptual limitations: the number of slots is hardwired; all slots have equal capacity; training has high com

Externí odkaz: http://arxiv.org/abs/2305.15001

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání