Výsledky vyhledávání - "Luan, Zhongzhi"

Report

FDLoRA: Personalized Federated Learning of Large Language Model via Dual LoRA Tuning

Autor: QI, Jiaxing, Luan, Zhongzhi, Huang, Shaohan, Fung, Carol, Yang, Hailong, Qian, Depei

Large language models (LLMs) have emerged as important components across various fields, yet their training requires substantial computation resources and abundant labeled data. It poses a challenge to robustly training LLMs for individual users (cli

Externí odkaz: http://arxiv.org/abs/2406.07925

Zobrazit plný text záznamu

Report

INSPIRIT: Optimizing Heterogeneous Task Scheduling through Adaptive Priority in Task-based Runtime Systems

Autor: Wang, Yiqing, Liu, Xiaoyan, Yang, Hailong, Yang, Xinyu, Wang, Pengbo, Liu, Yi, Luan, Zhongzhi, Qian, Depei

As modern HPC computing platforms become increasingly heterogeneous, it is challenging for programmers to fully leverage the computation power of massive parallelism offered by such heterogeneity. Consequently, task-based runtime systems have been pr

Externí odkaz: http://arxiv.org/abs/2404.03226

Zobrazit plný text záznamu

Report

Minions: Accelerating Large Language Model Inference with Adaptive and Collective Speculative Decoding

Autor: Wang, Siqi, Yang, Hailong, Wang, Xuezhu, Liu, Tongxuan, Wang, Pengbo, Liang, Xuning, Ma, Kejie, Feng, Tianyu, You, Xin, Bao, Yongjun, Liu, Yi, Luan, Zhongzhi, Qian, Depei

Large language models (LLM) have recently attracted surging interest due to their outstanding capabilities across various domains. However, enabling efficient LLM inference is challenging due to its autoregressive decoding that generates tokens only

Externí odkaz: http://arxiv.org/abs/2402.15678

Zobrazit plný text záznamu

Report

LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection

Autor: Qi, Jiaxing, Huang, Shaohan, Luan, Zhongzhi, Fung, Carol, Yang, Hailong, Qian, Depei

The increasing volume of log data produced by software-intensive systems makes it impractical to analyze them manually. Many deep learning-based methods have been proposed for log-based anomaly detection. These methods face several challenges such as

Externí odkaz: http://arxiv.org/abs/2309.01189

Zobrazit plný text záznamu

Report

Scaling Sentence Embeddings with Large Language Models

Autor: Jiang, Ting, Huang, Shaohan, Luan, Zhongzhi, Wang, Deqing, Zhuang, Fuzhen

Large language models (LLMs) have recently garnered significant interest. With in-context learning, LLMs achieve impressive results in various natural language tasks. However, the application of LLMs to sentence embeddings remains an area of ongoing

Externí odkaz: http://arxiv.org/abs/2307.16645

Zobrazit plný text záznamu

Report

LogQA: Question Answering in Unstructured Logs

Autor: Huang, Shaohan, Liu, Yi, Fung, Carol, Qi, Jiaxing, Yang, Hailong, Luan, Zhongzhi

Modern systems produce a large volume of logs to record run-time status and events. System operators use these raw logs to track a system in order to obtain some useful information to diagnose system anomalies. One of the most important problems in t

Externí odkaz: http://arxiv.org/abs/2303.11715

Zobrazit plný text záznamu

Report

Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU

Autor: Liao, Jianjin, Li, Mingzhen, Sun, Qingxiao, Hao, Jiwei, Yu, Fengwei, Chen, Shengdong, Tao, Ye, Zhang, Zicheng, Yang, Hailong, Luan, Zhongzhi, Qian, Depei

Larger deep learning models usually lead to higher model quality with an ever-increasing GPU memory footprint. Although tensor checkpointing techniques have been proposed to enable training under a restricted GPU memory budget, the input tensor dynam

Externí odkaz: http://arxiv.org/abs/2209.02478

Zobrazit plný text záznamu

Report

EasyScale: Accuracy-consistent Elastic Training for Deep Learning

Autor: Li, Mingzhen, Xiao, Wencong, Sun, Biao, Zhao, Hanyu, Yang, Hailong, Ren, Shiru, Luan, Zhongzhi, Jia, Xianyan, Liu, Yi, Li, Yong, Lin, Wei, Qian, Depei

Distributed synchronized GPU training is commonly used for deep learning. The resource constraint of using a fixed number of GPUs makes large-scale training jobs suffer from long queuing time for resource allocation, and lowers the cluster utilizatio

Externí odkaz: http://arxiv.org/abs/2208.14228

Zobrazit plný text záznamu

Report

FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity

Autor: Zhang, Shanjun, Li, Mingzhen, Yang, Hailong, Liu, Yi, Luan, Zhongzhi, Qian, Depei

Deploying various deep learning (DL) models efficiently has boosted the research on DL compilers. The difficulty of generating optimized tensor codes drives DL compiler to ask for the auto-tuning approaches, and the increasing demands require increas

Externí odkaz: http://arxiv.org/abs/2201.00194

Zobrazit plný text záznamu

Report

Accelerating Sparse Approximate Matrix Multiplication on GPUs

Autor: Liu, Xiaoyan, Liu, Yi, Dun, Ming, Yin, Bohong, Yang, Hailong, Luan, Zhongzhi, Qian, Depei

Although the matrix multiplication plays a vital role in computational linear algebra, there are few efficient solutions for matrix multiplication of the near-sparse matrices. The Sparse Approximate Matrix Multiply (SpAMM) is one of the algorithms to

Externí odkaz: http://arxiv.org/abs/2103.13042

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání