Zobrazeno 1 - 10
of 1 847
pro vyhledávání: '"A. Csordás"'
Autor:
A. Csordás, M. Al-Dalahmeh
Publikováno v:
Финансы: теория и практика, Vol 0, Iss 0 (2024)
Legitimacy theory posits that organizations strive to align with societal expectations to gain advantages, yet its focus has primarily been at the company level. The purpose of the study is to investigate the global applicability of legitimacy theory
Externí odkaz:
https://doaj.org/article/3465dd2da2c149548f755849d40f0742
Models that rely on subword tokenization have significant drawbacks, such as sensitivity to character-level noise like spelling errors and inconsistent compression rates across different languages and scripts. While character- or byte-level models li
Externí odkaz:
http://arxiv.org/abs/2410.20771
The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexampl
Externí odkaz:
http://arxiv.org/abs/2408.10920
Spectral properties of bounded linear operators play a crucial role in several areas of mathematics and physics. For each self-adjoint, trace-class operator $O$ we define a set $\Lambda_n\subset \mathbb{R}$, and we show that it converges to the spect
Externí odkaz:
http://arxiv.org/abs/2407.04478
Autor:
Csordás, Róbert, Irie, Kazuki, Schmidhuber, Jürgen, Potts, Christopher, Manning, Christopher D.
Previous work on Universal Transformers (UTs) has demonstrated the importance of parameter sharing across layers. By allowing recurrence in depth, UTs have advantages over standard Transformers in learning compositional generalizations, but layer-sha
Externí odkaz:
http://arxiv.org/abs/2405.16039
Positivity preservation is an important issue in the dynamics of open quantum systems: positivity violations always mark the border of validity of the model. We investigate the positivity of self-adjoint polynomial Gaussian integral operators $\wideh
Externí odkaz:
http://arxiv.org/abs/2405.04438
Despite many recent works on Mixture of Experts (MoEs) for resource-efficient Transformer language models, existing methods mostly focus on MoEs for feedforward layers. Previous attempts at extending MoE to the self-attention layer fail to match the
Externí odkaz:
http://arxiv.org/abs/2312.07987
General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF) -- previously acquired skills are
Externí odkaz:
http://arxiv.org/abs/2312.00276
Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. line
Externí odkaz:
http://arxiv.org/abs/2310.16076
How to reduce compute and memory requirements of neural networks (NNs) without sacrificing performance? Many recent works use sparse Mixtures of Experts (MoEs) to build resource-efficient large language models (LMs). Here we introduce several novel p
Externí odkaz:
http://arxiv.org/abs/2310.10837