Zobrazeno 1 - 10
of 179
pro vyhledávání: '"Ma, Shuming"'
Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining their performance. In this work, we introduce BitNet a4.8, enabling 4-bit activa
Externí odkaz:
http://arxiv.org/abs/2411.04965
Autor:
Wang, Jinheng, Zhou, Hansong, Song, Ting, Mao, Shaoguang, Ma, Shuming, Wang, Hongyu, Xia, Yan, Wei, Furu
Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption. These developments also enable local LLM deployment acr
Externí odkaz:
http://arxiv.org/abs/2410.16144
We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by
Externí odkaz:
http://arxiv.org/abs/2407.10969
Autor:
Sun, Yutao, Dong, Li, Zhu, Yi, Huang, Shaohan, Wang, Wenhui, Ma, Shuming, Zhang, Quanlu, Wang, Jianyong, Wei, Furu
We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-va
Externí odkaz:
http://arxiv.org/abs/2405.05254
Autor:
Ma, Shuming, Wang, Hongyu, Ma, Lingxiao, Wang, Lei, Wang, Wenhui, Huang, Shaohan, Dong, Li, Wang, Ruiping, Xue, Jilong, Wei, Furu
Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}.
Externí odkaz:
http://arxiv.org/abs/2402.17764
This technical report presents LongViT, a vision Transformer that can process gigapixel images in an end-to-end manner. Specifically, we split the gigapixel image into a sequence of millions of patches and project them linearly into embeddings. LongN
Externí odkaz:
http://arxiv.org/abs/2312.03558
With in-context learning ability, the performance of large language models can be significantly boosted when provided with appropriate context. However, existing in-context learning methods mainly rely on human-provided contexts, such as labeled exam
Externí odkaz:
http://arxiv.org/abs/2311.09263
Autor:
Wang, Hongyu, Ma, Shuming, Dong, Li, Huang, Shaohan, Wang, Huaijie, Ma, Lingxiao, Yang, Fan, Wang, Ruiping, Wu, Yi, Wei, Furu
The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture de
Externí odkaz:
http://arxiv.org/abs/2310.11453
Autor:
Lv, Tengchao, Huang, Yupan, Chen, Jingye, Zhao, Yuzhong, Jia, Yilin, Cui, Lei, Ma, Shuming, Chang, Yaoyao, Huang, Shaohan, Wang, Wenhui, Dong, Li, Luo, Weiyao, Wu, Shaoxiang, Wang, Guoxin, Zhang, Cha, Wei, Furu
The automatic reading of text-intensive images represents a significant advancement toward achieving Artificial General Intelligence (AGI). In this paper we present KOSMOS-2.5, a multimodal literate model for machine reading of text-intensive images.
Externí odkaz:
http://arxiv.org/abs/2309.11419
Autor:
Sun, Yutao, Dong, Li, Huang, Shaohan, Ma, Shuming, Xia, Yuqing, Xue, Jilong, Wang, Jianyong, Wei, Furu
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurr
Externí odkaz:
http://arxiv.org/abs/2307.08621