Zobrazeno 1 - 10
of 59
pro vyhledávání: '"Shen Xuyang"'
The interest in linear complexity models for large language models is on the rise, although their scaling capacity remains uncertain. In this study, we present the scaling laws for linear complexity language models to establish a foundation for their
Externí odkaz:
http://arxiv.org/abs/2406.16690
Linear attention mechanisms have gained prominence in causal language models due to their linear computational complexity and enhanced speed. However, the inherent decay mechanism in linear attention presents challenges when applied to multi-dimensio
Externí odkaz:
http://arxiv.org/abs/2405.21022
Autor:
Qin, Zhen, Shen, Xuyang, Li, Dong, Sun, Weigao, Birchfield, Stan, Hartley, Richard, Zhong, Yiran
We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single
Externí odkaz:
http://arxiv.org/abs/2405.17383
We present Lightning Attention, the first linear attention implementation that maintains a constant training speed for various sequence lengths under fixed memory consumption. Due to the issue with cumulative summation operations (cumsum), previous l
Externí odkaz:
http://arxiv.org/abs/2405.17381
Autor:
Mao, Yuxin, Shen, Xuyang, Zhang, Jing, Qin, Zhen, Zhou, Jinxing, Xiang, Mochu, Zhong, Yiran, Dai, Yuchao
The Text to Audible-Video Generation (TAVG) task involves generating videos with accompanying audio based on text descriptions. Achieving this requires skillful alignment of both audio and video elements. To support research in this field, we have de
Externí odkaz:
http://arxiv.org/abs/2404.14381
Hierarchically gated linear RNN (HGRN, \citealt{HGRN}) has demonstrated competitive training speed and performance in language modeling while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting i
Externí odkaz:
http://arxiv.org/abs/2404.07904
Sequence Parallel (SP) serves as a prevalent strategy to handle long sequences that exceed the memory limit of a single GPU. However, existing SP methods do not take advantage of linear attention features, resulting in sub-optimal parallelism efficie
Externí odkaz:
http://arxiv.org/abs/2404.02882
Publikováno v:
Jixie chuandong, Vol 42, Pp 129-135 (2018)
Aiming at the shortage that the workspace of planar parallel mechanism with two degrees of freedom is small,a 5 R mechanism of variable drive layout of large workspace is presented. On the basis of kinematics analysis,3 D kinematics simulation analys
Externí odkaz:
https://doaj.org/article/1a53af9a698849579cdfc4fb649c2d37
Autor:
Sun, Weigao, Qin, Zhen, Sun, Weixuan, Li, Shidi, Li, Dong, Shen, Xuyang, Qiao, Yu, Zhong, Yiran
The fundamental success of large language models hinges upon the efficacious implementation of large-scale distributed training techniques. Nevertheless, building a vast, high-performance cluster featuring high-speed communication interconnectivity i
Externí odkaz:
http://arxiv.org/abs/2401.16265
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Linear attention is an efficient attention mechanism that has recently emerged as a promising alternative to conventional softmax attention. With its ability to process tokens in linear computational complexities, linear attention, in theory, can han
Externí odkaz:
http://arxiv.org/abs/2401.04658