Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Zhang, Quanlu"'
Autor:
Xu, Si, Huang, Zixiao, Zeng, Yan, Yan, Shengen, Ning, Xuefei, Zhang, Quanlu, Ye, Haolin, Gu, Sipei, Shui, Chunsheng, Lin, Zhezheng, Zhang, Hao, Wang, Sheng, Dai, Guohao, Wang, Yu
Training large-scale models relies on a vast number of computing resources. For example, training the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs . It is a challenge to build a large-scale cluster with one type of GPU-accelerator.
Externí odkaz:
http://arxiv.org/abs/2405.16256
Autor:
Sun, Yutao, Dong, Li, Zhu, Yi, Huang, Shaohan, Wang, Wenhui, Ma, Shuming, Zhang, Quanlu, Wang, Jianyong, Wei, Furu
We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-va
Externí odkaz:
http://arxiv.org/abs/2405.05254
Autor:
Wan, Zhongwei, Wang, Xin, Liu, Che, Alam, Samiul, Zheng, Yu, Liu, Jiachen, Qu, Zhongnan, Yan, Shen, Zhu, Yi, Zhang, Quanlu, Chowdhury, Mosharaf, Zhang, Mi
Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding and language generation, and thus have the potential to make a substantial impact on our society. Such capabilities, howe
Externí odkaz:
http://arxiv.org/abs/2312.03863
Vision Transformers have shown great performance in single tasks such as classification and segmentation. However, real-world problems are not isolated, which calls for vision transformers that can perform multiple tasks concurrently. Existing multi-
Externí odkaz:
http://arxiv.org/abs/2304.08756
Autor:
Tang, Chen, Zhang, Li Lyna, Jiang, Huiqiang, Xu, Jiahang, Cao, Ting, Zhang, Quanlu, Yang, Yuqing, Wang, Zhi, Yang, Mao
Neural Architecture Search (NAS) has shown promising performance in the automatic design of vision transformers (ViT) exceeding 1G FLOPs. However, designing lightweight and low-latency ViT models for diverse mobile devices remains a big challenge. In
Externí odkaz:
http://arxiv.org/abs/2303.09730
Autor:
Zhang, Li Lyna, Wang, Xudong, Xu, Jiahang, Zhang, Quanlu, Wang, Yujing, Yang, Yuqing, Zheng, Ningxin, Cao, Ting, Yang, Mao
The combination of Neural Architecture Search (NAS) and quantization has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN). However, directly applying NAS to design accurate QNN models that achieve low latenc
Externí odkaz:
http://arxiv.org/abs/2303.08308
Autor:
Zheng, Ningxin, Jiang, Huiqiang, Zhang, Quanlu, Han, Zhenhua, Yang, Yuqing, Ma, Lingxiao, Yang, Fan, Zhang, Chengruidong, Qiu, Lili, Yang, Mao, Zhou, Lidong
Dynamic sparsity, where the sparsity patterns are unknown until runtime, poses a significant challenge to deep learning. The state-of-the-art sparsity-aware deep learning solutions are restricted to pre-defined, static sparsity patterns due to signif
Externí odkaz:
http://arxiv.org/abs/2301.10936
Autor:
Lin, Zhiqi, Miao, Youshan, Liu, Guodong, Shi, Xiaoxiang, Zhang, Quanlu, Yang, Fan, Maleki, Saeed, Zhu, Yi, Cao, Xu, Li, Cheng, Yang, Mao, Zhang, Lintao, Zhou, Lidong
With the growing model size, deep neural networks (DNN) are increasingly trained over massive GPU accelerators, which demands a proper parallelization plan that transforms a DNN model into fine-grained tasks and then schedules them to GPUs for execut
Externí odkaz:
http://arxiv.org/abs/2301.08984
Autor:
Guo, Cong, Qiu, Yuxian, Leng, Jingwen, Zhang, Chen, Cao, Ying, Zhang, Quanlu, Liu, Yunxin, Yang, Fan, Guo, Minyi
An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN). Many novel and sophisticated activation functions have been proposed to improve the DNN accuracy but also consume massive memory i
Externí odkaz:
http://arxiv.org/abs/2209.10778
Autor:
Yan, Chenqian, Zhang, Yuge, Zhang, Quanlu, Yang, Yaming, Jiang, Xinyang, Yang, Yuqing, Wang, Baoyuan
Despite the impressive progress of general face detection, the tuning of hyper-parameters and architectures is still critical for the performance of a domain-specific face detector. Though existing AutoML works can speedup such process, they either r
Externí odkaz:
http://arxiv.org/abs/2203.08399