Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Trockman, Asher"'
Recent work has shown that state space models such as Mamba are significantly worse than Transformers on recall-based tasks due to the fact that their state size is constant with respect to their input sequence length. But in practice, state space mo
Externí odkaz:
http://arxiv.org/abs/2410.11135
Autor:
Trockman, Asher, Kolter, J. Zico
It is notoriously difficult to train Transformers on small datasets; typically, large pre-trained models are instead used as the starting point. We explore the weights of such pre-trained Transformers (particularly for vision) to attempt to find reas
Externí odkaz:
http://arxiv.org/abs/2305.09828
Neural network weights are typically initialized at random from univariate distributions, controlling just the variance of individual weights even in highly-structured operations like convolutions. Recent ViT-inspired convolutional networks such as C
Externí odkaz:
http://arxiv.org/abs/2210.03651
Autor:
Trockman, Asher, Kolter, J. Zico
Although convolutional networks have been the dominant architecture for vision tasks for many years, recent experiments have shown that Transformer-based models, most notably the Vision Transformer (ViT), may exceed their performance in some settings
Externí odkaz:
http://arxiv.org/abs/2201.09792
Autor:
Trockman, Asher, Kolter, J. Zico
Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipsch
Externí odkaz:
http://arxiv.org/abs/2104.07167
Publikováno v:
ICSE: International Conference on Software Engineering; 5/25/2019, p476-487, 12p
Autor:
Trockman, Asher
Publikováno v:
ICSE: International Conference on Software Engineering; 5/27/2018, p524-526, 3p