Zobrazeno 1 - 10
of 12 241
pro vyhledávání: '"Sepp A"'
While Transformers and other sequence-parallelizable neural network architectures seem like the current state of the art in sequence modeling, they specifically lack state-tracking capabilities. These are important for time-series tasks and logical r
Externí odkaz:
http://arxiv.org/abs/2412.07752
Autor:
Schmidinger, Niklas, Schneckenreiter, Lisa, Seidl, Philipp, Schimunek, Johannes, Hoedt, Pieter-Jan, Brandstetter, Johannes, Mayr, Andreas, Luukkonen, Sohvi, Hochreiter, Sepp, Klambauer, Günter
Language models for biological and chemical sequences enable crucial applications such as drug discovery, protein engineering, and precision medicine. Currently, these language models are predominantly based on Transformer architectures. While Transf
Externí odkaz:
http://arxiv.org/abs/2411.04165
Autor:
Schmied, Thomas, Adler, Thomas, Patil, Vihang, Beck, Maximilian, Pöppel, Korbinian, Brandstetter, Johannes, Klambauer, Günter, Pascanu, Razvan, Hochreiter, Sepp
In recent years, there has been a trend in the field of Reinforcement Learning (RL) towards large action models trained offline on large-scale datasets via sequence modeling. Existing models are primarily based on the Transformer architecture, which
Externí odkaz:
http://arxiv.org/abs/2410.22391
Ensembles of Deep Neural Networks, Deep Ensembles, are widely used as a simple way to boost predictive performance. However, their impact on algorithmic fairness is not well understood yet. Algorithmic fairness investigates how a model's performance
Externí odkaz:
http://arxiv.org/abs/2410.13831
Reliable estimation of predictive uncertainty is crucial for machine learning applications, particularly in high-stakes scenarios where hedging against risks is essential. Despite its significance, a consensus on the correct measurement of predictive
Externí odkaz:
http://arxiv.org/abs/2410.10786
Autor:
Paischer, Fabian, Hauzenberger, Lukas, Schmied, Thomas, Alkin, Benedikt, Deisenroth, Marc Peter, Hochreiter, Sepp
Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to update the pre-trained weights via a low-rank ada
Externí odkaz:
http://arxiv.org/abs/2410.07170
Autor:
Schmied, Thomas, Paischer, Fabian, Patil, Vihang, Hofmarcher, Markus, Pascanu, Razvan, Hochreiter, Sepp
In-context learning (ICL) is the ability of a model to learn a new task by observing a few exemplars in its context. While prevalent in NLP, this capability has recently also been observed in Reinforcement Learning (RL) settings. Prior in-context RL
Externí odkaz:
http://arxiv.org/abs/2410.07071
Humans excel at abstracting data and constructing \emph{reusable} concepts, a capability lacking in current continual learning systems. The field of object-centric learning addresses this by developing abstract representations, or slots, from data wi
Externí odkaz:
http://arxiv.org/abs/2410.00728
Learning agents with reinforcement learning is difficult when dealing with long trajectories that involve a large number of states. To address these learning problems effectively, the number of states can be reduced by abstract representations that c
Externí odkaz:
http://arxiv.org/abs/2410.00704
Autor:
Pandurov, Milan, Humbel, Lukas, Sepp, Dmitry, Ttofari, Adamos, Thomm, Leon, Quoc, Do Le, Chandrasekaran, Siddharth, Santhanam, Sharan, Ye, Chuan, Bergman, Shai, Wang, Wei, Lundgren, Sven, Sagonas, Konstantinos, Ros, Alberto
Memory has become the primary cost driver in cloud data centers. Yet, a significant portion of memory allocated to VMs in public clouds remains unused. To optimize this resource, "cold" memory can be reclaimed from VMs and stored on slower storage or
Externí odkaz:
http://arxiv.org/abs/2409.13327