Zobrazeno 1 - 10
of 102
pro vyhledávání: '"Mai, Luo"'
Autor:
Sun, Chuanhao, Triantafyllou, Thanos, Makris, Anthos, Drmač, Maja, Xu, Kai, Mai, Luo, Marina, Mahesh K.
View synthesis using Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) has demonstrated impressive fidelity in rendering real-world scenarios. However, practical methods for accurate and efficient epistemic Uncertainty Quantification (UQ) in
Externí odkaz:
http://arxiv.org/abs/2410.05468
Autor:
Sun, Chuanhao, Yuan, Zhihang, Xu, Kai, Mai, Luo, Siddharth, N., Chen, Shuo, Marina, Mahesh K.
Fourier features based positional encoding (PE) is commonly used in machine learning tasks that involve learning high-frequency features from low-dimensional inputs, such as 3D view synthesis and time series regression with neural tangent kernels. De
Externí odkaz:
http://arxiv.org/abs/2407.09370
This paper presents MoE-Infinity, an offloading-efficient serving system for sparse mixture-of-experts (MoE) models. To optimize offloading, MoE-Infinity achieves novel request-level tracing for expert activation, capturing MoE's sparse execution pat
Externí odkaz:
http://arxiv.org/abs/2401.14361
Autor:
Fu, Yao, Xue, Leyang, Huang, Yeqi, Brabete, Andrei-Octavian, Ustiugov, Dmitrii, Patel, Yuvraj, Mai, Luo
This paper presents ServerlessLLM, a distributed system designed to support low-latency serverless inference for Large Language Models (LLMs). By harnessing the substantial near-GPU storage and memory capacities of inference servers, ServerlessLLM ac
Externí odkaz:
http://arxiv.org/abs/2401.14351
Deep learning (DL) jobs use multi-dimensional parallelism, i.e. combining data, model, and pipeline parallelism, to use large GPU clusters efficiently. Long-running jobs may experience changes to their GPU allocation: (i) resource elasticity during t
Externí odkaz:
http://arxiv.org/abs/2312.05181
Autor:
Wang, Hanjing, Sit, Man-Kit, He, Congjie, Wen, Ying, Zhang, Weinan, Wang, Jun, Yang, Yaodong, Mai, Luo
Publikováno v:
ICML2023
This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face consid
Externí odkaz:
http://arxiv.org/abs/2310.05205
Autor:
Wen, Muning, Lin, Runji, Wang, Hanjing, Yang, Yaodong, Wen, Ying, Mai, Luo, Wang, Jun, Zhang, Haifeng, Zhang, Weinan
Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e.g., GPT-3 and Swin Transformer. Although originally designed for
Externí odkaz:
http://arxiv.org/abs/2306.13945
Autor:
Tan, Zeyuan, Yuan, Xiulong, He, Congjie, Sit, Man-Kit, Li, Guo, Liu, Xiaoze, Ai, Baole, Zeng, Kai, Pietzuch, Peter, Mai, Luo
Systems for serving inference requests on graph neural networks (GNN) must combine low latency with high throughout, but they face irregular computation due to skew in the number of sampled graph nodes and aggregated GNN features. This makes it chall
Externí odkaz:
http://arxiv.org/abs/2305.10863
Recent years have witnessed the booming of various differentiable optimization algorithms. These algorithms exhibit different execution patterns, and their execution needs massive computational resources that go beyond a single CPU and GPU. Existing
Externí odkaz:
http://arxiv.org/abs/2211.06934
Autor:
Feng, Xidong, Liu, Bo, Ren, Jie, Mai, Luo, Zhu, Rui, Zhang, Haifeng, Wang, Jun, Yang, Yaodong
Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations. In this paper, we develop a un
Externí odkaz:
http://arxiv.org/abs/2112.15400