Zobrazeno 1 - 10
of 1 945
pro vyhledávání: '"ZHOU, Ding"'
Pre-training Transformer models is resource-intensive, and recent studies have shown that sign momentum is an efficient technique for training large-scale deep learning models, particularly Transformers. However, its application in distributed traini
Externí odkaz:
http://arxiv.org/abs/2411.17866
Autor:
Xie, Yanyue, Zhang, Zhi, Zhou, Ding, Xie, Cong, Song, Ziang, Liu, Xin, Wang, Yanzhi, Lin, Xue, Xu, An
Mixture-of-Experts (MoE) architectures face challenges such as high memory consumption and redundancy in experts. Pruning MoE can reduce network weights while maintaining model performance. Motivated by the recent observation of emergent large magnit
Externí odkaz:
http://arxiv.org/abs/2410.12013
In recent years, there has been growing interest in the field of functional neural networks. They have been proposed and studied with the aim of approximating continuous functionals defined on sets of functions on Euclidean domains. In this paper, we
Externí odkaz:
http://arxiv.org/abs/2410.01047
In this work, we explore intersections between sparse coding and deep learning to enhance our understanding of feature extraction capabilities in advanced neural network architectures. We begin by introducing a novel class of Deep Sparse Coding (DSC)
Externí odkaz:
http://arxiv.org/abs/2408.05540
In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for t
Externí odkaz:
http://arxiv.org/abs/2407.01281
While considerable theoretical progress has been devoted to the study of metric and similarity learning, the generalization mystery is still missing. In this paper, we study the generalization performance of metric and similarity learning by leveragi
Externí odkaz:
http://arxiv.org/abs/2405.06415
We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels. Our first result proves a new approximation bound for CNNs with certain constraint on the weights. Our second res
Externí odkaz:
http://arxiv.org/abs/2403.16459
Promptly discovering unknown network attacks is critical for reducing the risk of major loss imposed on system or equipment. This paper aims to develop an open-set intrusion detection model to classify known attacks as well as inferring unknown ones.
Externí odkaz:
http://arxiv.org/abs/2403.04193
Autor:
Zhou, Ding-Bang, Gao, Kuang-Hong, Zhao, Meng-Fan, Jia, Zhi-Yan, Hu, Xiao-Xia, Guo, Qian-Jin, Du, Hai-Yan, Chen, Xiao-Ping, Li, Zhi-Qing
Layered transition metal chalcogenides have stimulated a wide research interest due to their many exotic physical properties. In this paper, we studied the magnetotransport properties of the exfoliated TaNiTe5, a recently discovered Dirac nodal-line
Externí odkaz:
http://arxiv.org/abs/2402.16088
Autor:
Jiang, Ziheng, Lin, Haibin, Zhong, Yinmin, Huang, Qi, Chen, Yangrui, Zhang, Zhi, Peng, Yanghua, Li, Xiang, Xie, Cong, Nong, Shibiao, Jia, Yulu, He, Sun, Chen, Hongmin, Bai, Zhihao, Hou, Qi, Yan, Shipeng, Zhou, Ding, Sheng, Yiyao, Jiang, Zhuo, Xu, Haohan, Wei, Haoran, Zhang, Zhang, Nie, Pengfei, Zou, Leqi, Zhao, Sida, Xiang, Liang, Liu, Zherui, Li, Zhe, Jia, Xiaoying, Ye, Jianxi, Jin, Xin, Liu, Xin
We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedente
Externí odkaz:
http://arxiv.org/abs/2402.15627