Výsledky vyhledávání - "Zhang, Quanlu"

Report

HETHUB: A Distributed Training System with Heterogeneous Cluster for Large-Scale Models

Autor: Xu, Si, Huang, Zixiao, Zeng, Yan, Yan, Shengen, Ning, Xuefei, Zhang, Quanlu, Ye, Haolin, Gu, Sipei, Shui, Chunsheng, Lin, Zhezheng, Zhang, Hao, Wang, Sheng, Dai, Guohao, Wang, Yu

Training large-scale models relies on a vast number of computing resources. For example, training the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs . It is a challenge to build a large-scale cluster with one type of GPU-accelerator.

Externí odkaz: http://arxiv.org/abs/2405.16256

Zobrazit plný text záznamu

Report

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Autor: Sun, Yutao, Dong, Li, Zhu, Yi, Huang, Shaohan, Wang, Wenhui, Ma, Shuming, Zhang, Quanlu, Wang, Jianyong, Wei, Furu

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-va

Externí odkaz: http://arxiv.org/abs/2405.05254

Zobrazit plný text záznamu

Report

Efficient Large Language Models: A Survey

Autor: Wan, Zhongwei, Wang, Xin, Liu, Che, Alam, Samiul, Zheng, Yu, Liu, Jiachen, Qu, Zhongnan, Yan, Shen, Zhu, Yi, Zhang, Quanlu, Chowdhury, Mosharaf, Zhang, Mi

Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding and language generation, and thus have the potential to make a substantial impact on our society. Such capabilities, howe

Externí odkaz: http://arxiv.org/abs/2312.03863

Zobrazit plný text záznamu

Report

AutoTaskFormer: Searching Vision Transformers for Multi-task Learning

Autor: Liu, Yang, Yan, Shen, Zhang, Yuge, Ren, Kan, Zhang, Quanlu, Ren, Zebin, Cai, Deng, Zhang, Mi

Vision Transformers have shown great performance in single tasks such as classification and segmentation. However, real-world problems are not isolated, which calls for vision transformers that can perform multiple tasks concurrently. Existing multi-

Externí odkaz: http://arxiv.org/abs/2304.08756

Zobrazit plný text záznamu

Report

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices

Autor: Tang, Chen, Zhang, Li Lyna, Jiang, Huiqiang, Xu, Jiahang, Cao, Ting, Zhang, Quanlu, Yang, Yuqing, Wang, Zhi, Yang, Mao

Neural Architecture Search (NAS) has shown promising performance in the automatic design of vision transformers (ViT) exceeding 1G FLOPs. However, designing lightweight and low-latency ViT models for diverse mobile devices remains a big challenge. In

Externí odkaz: http://arxiv.org/abs/2303.09730

Zobrazit plný text záznamu

Report

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

Autor: Zhang, Li Lyna, Wang, Xudong, Xu, Jiahang, Zhang, Quanlu, Wang, Yujing, Yang, Yuqing, Zheng, Ningxin, Cao, Ting, Yang, Mao

The combination of Neural Architecture Search (NAS) and quantization has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN). However, directly applying NAS to design accurate QNN models that achieve low latenc

Externí odkaz: http://arxiv.org/abs/2303.08308

Zobrazit plný text záznamu

Report

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation

Autor: Zheng, Ningxin, Jiang, Huiqiang, Zhang, Quanlu, Han, Zhenhua, Yang, Yuqing, Ma, Lingxiao, Yang, Fan, Zhang, Chengruidong, Qiu, Lili, Yang, Mao, Zhou, Lidong

Dynamic sparsity, where the sparsity patterns are unknown until runtime, poses a significant challenge to deep learning. The state-of-the-art sparsity-aware deep learning solutions are restricted to pre-defined, static sparsity patterns due to signif

Externí odkaz: http://arxiv.org/abs/2301.10936

Zobrazit plný text záznamu

Report

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

Autor: Lin, Zhiqi, Miao, Youshan, Liu, Guodong, Shi, Xiaoxiang, Zhang, Quanlu, Yang, Fan, Maleki, Saeed, Zhu, Yi, Cao, Xu, Li, Cheng, Yang, Mao, Zhang, Lintao, Zhou, Lidong

With the growing model size, deep neural networks (DNN) are increasingly trained over massive GPU accelerators, which demands a proper parallelization plan that transforms a DNN model into fine-grained tasks and then schedules them to GPUs for execut

Externí odkaz: http://arxiv.org/abs/2301.08984

Zobrazit plný text záznamu

Report

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

Autor: Guo, Cong, Qiu, Yuxian, Leng, Jingwen, Zhang, Chen, Cao, Ying, Zhang, Quanlu, Liu, Yunxin, Yang, Fan, Guo, Minyi

An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN). Many novel and sophisticated activation functions have been proposed to improve the DNN accuracy but also consume massive memory i

Externí odkaz: http://arxiv.org/abs/2209.10778

Zobrazit plný text záznamu

Report

Privacy-preserving Online AutoML for Domain-Specific Face Detection

Autor: Yan, Chenqian, Zhang, Yuge, Zhang, Quanlu, Yang, Yaming, Jiang, Xinyang, Yang, Yuqing, Wang, Baoyuan

Despite the impressive progress of general face detection, the tuning of hyper-parameters and architectures is still critical for the performance of a domain-specific face detector. Though existing AutoML works can speedup such process, they either r

Externí odkaz: http://arxiv.org/abs/2203.08399

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání