Zobrazeno 1 - 10
of 34
pro vyhledávání: '"Li, Shanda"'
While the scaling laws of large language models (LLMs) training have been extensively studied, optimal inference configurations of LLMs remain underexplored. We study inference scaling laws and compute-optimal inference, focusing on the trade-offs be
Externí odkaz:
http://arxiv.org/abs/2408.00724
Autor:
Li, Shanda, You, Chong, Guruganesh, Guru, Ainslie, Joshua, Ontanon, Santiago, Zaheer, Manzil, Sanghai, Sumit, Yang, Yiming, Kumar, Sanjiv, Bhojanapalli, Srinadh
Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models. Though the Transformer architecture has fundamentally no limits on the inp
Externí odkaz:
http://arxiv.org/abs/2310.04418
Autor:
Choromanski, Krzysztof Marcin, Li, Shanda, Likhosherstov, Valerii, Dubey, Kumar Avinava, Luo, Shengjie, He, Di, Yang, Yiming, Sarlos, Tamas, Weingarten, Thomas, Weller, Adrian
We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs). These include regular RPE techniques applied for sequential data, as well as no
Externí odkaz:
http://arxiv.org/abs/2302.01925
The Physics-Informed Neural Network (PINN) approach is a new and promising way to solve partial differential equations using deep learning. The $L^2$ Physics-Informed Loss is the de-facto standard in training Physics-Informed Neural Networks. In this
Externí odkaz:
http://arxiv.org/abs/2206.02016
Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is
Externí odkaz:
http://arxiv.org/abs/2205.13401
Autor:
He, Di, Li, Shanda, Shi, Wenlei, Gao, Xiaotian, Zhang, Jia, Bian, Jiang, Wang, Liwei, Liu, Tie-Yan
Physics-Informed Neural Network (PINN) has become a commonly used machine learning approach to solve partial differential equations (PDE). But, facing high-dimensional secondorder PDE problems, PINN will suffer from severe scalability issues since it
Externí odkaz:
http://arxiv.org/abs/2202.09340
Several recent studies have demonstrated that attention-based networks, such as Vision Transformer (ViT), can outperform Convolutional Neural Networks (CNNs) on several computer vision tasks without using convolutional layers. This naturally leads to
Externí odkaz:
http://arxiv.org/abs/2111.01353
Publikováno v:
In Engineering Applications of Artificial Intelligence July 2024 133 Part A
Autor:
Luo, Shengjie, Li, Shanda, Cai, Tianle, He, Di, Peng, Dinglan, Zheng, Shuxin, Ke, Guolin, Wang, Liwei, Liu, Tie-Yan
The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, lea
Externí odkaz:
http://arxiv.org/abs/2106.12566
Autor:
Cong, Peichao1 (AUTHOR) cpclzx2022@gxust.edu.cn, Li, Shanda1 (AUTHOR) cpclzx2022@gxust.edu.cn, Zhou, Jiachao1 (AUTHOR), Lv, Kunfeng1 (AUTHOR), Feng, Hao1 (AUTHOR)
Publikováno v:
Agronomy. Jan2023, Vol. 13 Issue 1, p196. 24p.