Zobrazeno 1 - 3
of 3
pro vyhledávání: '"Dooms, Thomas"'
A mechanistic understanding of how MLPs do computation in deep neural networks remains elusive. Current interpretability work can extract features from hidden activations over an input dataset but generally cannot explain how MLP weights construct fe
Externí odkaz:
http://arxiv.org/abs/2410.08417
Gated Linear Units (GLUs) have become a common building block in modern foundation models. Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs. An attractive quality of bilinear layers is that they
Externí odkaz:
http://arxiv.org/abs/2406.03947
Modern machine learning models are able to outperform humans on a variety of non-trivial tasks. However, as the complexity of the models increases, they consume significant amounts of power and still struggle to generalize effectively to unseen data.
Externí odkaz:
http://arxiv.org/abs/2311.18130