Zobrazeno 1 - 10
of 1 968
pro vyhledávání: '"Csordás A"'
Models that rely on subword tokenization have significant drawbacks, such as sensitivity to character-level noise like spelling errors and inconsistent compression rates across different languages and scripts. While character- or byte-level models li
Externí odkaz:
http://arxiv.org/abs/2410.20771
The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexampl
Externí odkaz:
http://arxiv.org/abs/2408.10920
Spectral properties of bounded linear operators play a crucial role in several areas of mathematics and physics. For each self-adjoint, trace-class operator $O$ we define a set $\Lambda_n\subset \mathbb{R}$, and we show that it converges to the spect
Externí odkaz:
http://arxiv.org/abs/2407.04478
Publikováno v:
Nukleonika, Vol 65, Iss 2, Pp 133-137 (2020)
According to the new European Union Basic Safety Standards (EU-BSS), preparation of the National Radon Action Plan is obligatory for the Member States. One of the plan’s aims is to carry out an indoor radon survey to identify radon-prone areas. In
Externí odkaz:
https://doaj.org/article/c3ee3d2d259b4184ba7b33c5ea9e2557
Autor:
Csordás, Róbert, Irie, Kazuki, Schmidhuber, Jürgen, Potts, Christopher, Manning, Christopher D.
Previous work on Universal Transformers (UTs) has demonstrated the importance of parameter sharing across layers. By allowing recurrence in depth, UTs have advantages over standard Transformers in learning compositional generalizations, but layer-sha
Externí odkaz:
http://arxiv.org/abs/2405.16039
Positivity preservation is an important issue in the dynamics of open quantum systems: positivity violations always mark the border of validity of the model. We investigate the positivity of self-adjoint polynomial Gaussian integral operators $\wideh
Externí odkaz:
http://arxiv.org/abs/2405.04438
Despite many recent works on Mixture of Experts (MoEs) for resource-efficient Transformer language models, existing methods mostly focus on MoEs for feedforward layers. Previous attempts at extending MoE to the self-attention layer fail to match the
Externí odkaz:
http://arxiv.org/abs/2312.07987
General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF) -- previously acquired skills are
Externí odkaz:
http://arxiv.org/abs/2312.00276
Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. line
Externí odkaz:
http://arxiv.org/abs/2310.16076
How to reduce compute and memory requirements of neural networks (NNs) without sacrificing performance? Many recent works use sparse Mixtures of Experts (MoEs) to build resource-efficient large language models (LMs). Here we introduce several novel p
Externí odkaz:
http://arxiv.org/abs/2310.10837