Zobrazeno 1 - 10
of 1 381
pro vyhledávání: '"Sander, Michael"'
Transformers have achieved state-of-the-art performance in language modeling tasks. However, the reasons behind their tremendous success are still unclear. In this paper, towards a better understanding, we train a Transformer model on a simple next t
Externí odkaz:
http://arxiv.org/abs/2402.05787
Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still l
Externí odkaz:
http://arxiv.org/abs/2309.01213
The top-k operator returns a sparse vector, where the non-zero values correspond to the k largest values of the input. Unfortunately, because it is a discontinuous function, it is difficult to incorporate in neural networks trained end-to-end with ba
Externí odkaz:
http://arxiv.org/abs/2302.01425
Vision Transformers (ViTs) have achieved comparable or superior performance than Convolutional Neural Networks (CNNs) in computer vision. This empirical breakthrough is even more remarkable since, in contrast to CNNs, ViTs do not embed any visual ind
Externí odkaz:
http://arxiv.org/abs/2210.09221
Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify
Externí odkaz:
http://arxiv.org/abs/2205.14612
Attention based models such as Transformers involve pairwise interactions between data points, modeled with a learnable attention matrix. Importantly, this attention matrix is normalized with the SoftMax operator, which makes it row-wise stochastic.
Externí odkaz:
http://arxiv.org/abs/2110.11773
Autor:
Grant, Michael C., Crisafi, Cheryl, Alvarez, Adrian, Arora, Rakesh C., Brindle, Mary E., Chatterjee, Subhasis, Ender, Joerg, Fletcher, Nick, Gregory, Alexander J., Gunaydin, Serdar, Jahangiri, Marjan, Ljungqvist, Olle, Lobdell, Kevin W., Morton, Vicki, Reddy, V. Seenu, Salenger, Rawn, Sander, Michael, Zarbock, Alexander, Engelman, Daniel T.
Publikováno v:
In The Annals of Thoracic Surgery April 2024 117(4):669-689
The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network. A way to circumvent this issue is to use reversible architectures. In this paper, we prop
Externí odkaz:
http://arxiv.org/abs/2102.07870
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.