Zobrazeno 1 - 10
of 141
pro vyhledávání: '"Wu, Yongji"'
Autor:
Wu, Yongji, Qu, Wenjie, Tao, Tianyang, Wang, Zhuang, Bai, Wei, Li, Zhuohao, Tian, Yuan, Zhang, Jiaheng, Lentz, Matthew, Zhuo, Danyang
Sparsely-activated Mixture-of-Experts (MoE) architecture has increasingly been adopted to further scale large language models (LLMs) due to its sub-linear scaling for computation costs. However, frequent failures still pose significant challenges as
Externí odkaz:
http://arxiv.org/abs/2407.04656
Autor:
Xu, Ceyu, Wu, Yongji, Yang, Xinyu, Chen, Beidi, Lentz, Matthew, Zhuo, Danyang, Wills, Lisa Wu
As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottleneck
Externí odkaz:
http://arxiv.org/abs/2407.00467
Autor:
Jin, Shuowei, Wu, Yongji, Zheng, Haizhong, Zhang, Qingzhao, Lentz, Matthew, Mao, Z. Morley, Prakash, Atul, Qian, Feng, Zhuo, Danyang
Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e.g., 70B+); however, LLM inference incurs significant computation and memory costs. Recent approaches
Externí odkaz:
http://arxiv.org/abs/2402.12280
Autor:
Lu, Yao, Bian, Song, Chen, Lequn, He, Yongjun, Hui, Yulong, Lentz, Matthew, Li, Beibin, Liu, Fei, Li, Jialin, Liu, Qi, Liu, Rui, Liu, Xiaoxuan, Ma, Lin, Rong, Kexin, Wang, Jianguo, Wu, Yingjun, Wu, Yongji, Zhang, Huanchen, Zhang, Minjia, Zhang, Qizhen, Zhou, Tianyi, Zhuo, Danyang
In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand fo
Externí odkaz:
http://arxiv.org/abs/2401.12230
Augmented reality technology has been widely used in industrial design interaction, exhibition guide, information retrieval and other fields. The combination of artificial intelligence and augmented reality technology has also become a future develop
Externí odkaz:
http://arxiv.org/abs/2311.12430
Low-rank adaptation (LoRA) has become an important and popular method to adapt pre-trained models to specific domains. We present Punica, a system to serve multiple LoRA models in a shared GPU cluster. Punica contains a new CUDA kernel design that al
Externí odkaz:
http://arxiv.org/abs/2310.18547
Autor:
Chen, Jingrong, Wu, Yongji, Lin, Shihan, Xu, Yechen, Kong, Xinhao, Anderson, Thomas, Lentz, Matthew, Yang, Xiaowei, Zhuo, Danyang
Remote Procedure Call (RPC) is a widely used abstraction for cloud computing. The programmer specifies type information for each remote procedure, and a compiler generates stub code linked into each application to marshal and unmarshal arguments into
Externí odkaz:
http://arxiv.org/abs/2304.07349
With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a hom
Externí odkaz:
http://arxiv.org/abs/2205.04713
Publikováno v:
In Sensors and Actuators: B. Chemical 1 December 2024 420
Local Differential Privacy (LDP) protocols enable an untrusted server to perform privacy-preserving, federated data analytics. Various LDP protocols have been developed for different types of data such as categorical data, numerical data, and key-val
Externí odkaz:
http://arxiv.org/abs/2111.11534