Zobrazeno 1 - 10
of 18 517
pro vyhledávání: '"TAN, M"'
Autor:
Teo, Rachel S. Y., Nguyen, Tan M.
Sparse Mixture of Experts (SMoE) has become the key to unlocking unparalleled scalability in deep learning. SMoE has the potential to exponentially increase parameter count while maintaining the efficiency of the model by only activating a small subs
Externí odkaz:
http://arxiv.org/abs/2410.14574
Neural functional networks (NFNs) have recently gained significant attention due to their diverse applications, ranging from predicting network generalization and network editing to classifying implicit neural representation. Previous NFN designs oft
Externí odkaz:
http://arxiv.org/abs/2409.11697
Autor:
Nguyen, Tan M., Nguyen, Tam, Ho, Nhat, Bertozzi, Andrea L., Baraniuk, Richard G., Osher, Stanley J.
Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by h
Externí odkaz:
http://arxiv.org/abs/2406.13781
Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input to
Externí odkaz:
http://arxiv.org/abs/2406.13770
Autor:
Teo, Rachel S. Y., Nguyen, Tan M.
The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most deep learnin
Externí odkaz:
http://arxiv.org/abs/2406.13762
Sliced Wasserstein (SW) distance in Optimal Transport (OT) is widely used in various applications thanks to its statistical effectiveness and computational efficiency. On the other hand, Tree Wassenstein (TW) and Tree-sliced Wassenstein (TSW) are ins
Externí odkaz:
http://arxiv.org/abs/2406.13725
In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its sol
Externí odkaz:
http://arxiv.org/abs/2402.15989
Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the t
Externí odkaz:
http://arxiv.org/abs/2312.00751
We propose the Kuramoto Graph Neural Network (KuramotoGNN), a novel class of continuous-depth graph neural networks (GNNs) that employs the Kuramoto model to mitigate the over-smoothing phenomenon, in which node features in GNNs become indistinguisha
Externí odkaz:
http://arxiv.org/abs/2311.03260
$p$-Laplacian regularization, rooted in graph and image signal processing, introduces a parameter $p$ to control the regularization effect on these data. Smaller values of $p$ promote sparsity and interpretability, while larger values encourage smoot
Externí odkaz:
http://arxiv.org/abs/2311.03235