Výsledky vyhledávání

Report

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads

Autor: Tang, Hanlin, Lin, Yang, Lin, Jing, Han, Qingsen, Hong, Shikuan, Yao, Yiwu, Wang, Gongyi

The memory and computational demands of Key-Value (KV) cache present significant challenges for deploying long-context language models. Previous approaches attempt to mitigate this issue by selectively dropping tokens, which irreversibly erases criti

Externí odkaz: http://arxiv.org/abs/2407.15891

Zobrazit plný text záznamu

Report

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

Autor: Zhang, Yuxin, Zhao, Lirui, Lin, Mingbao, Sun, Yunyun, Yao, Yiwu, Han, Xingjia, Tanner, Jared, Liu, Shiwei, Ji, Rongrong

The ever-increasing large language models (LLMs), though opening a potential path for the upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards their on-device deployment. As one of the most well-established pre

Externí odkaz: http://arxiv.org/abs/2310.08915

Zobrazit plný text záznamu

Akademický článek

Multi-teacher Contrastive Knowledge Inversion for Data-Free Distillation

Autor: LIN Zhenyuan, LIN Shaohui, YAO Yiwu, HE Gaoqi, WANG Changbo, MA Lizhuang

Publikováno v: Jisuanji kexue yu tansuo, Vol 17, Iss 11, Pp 2721-2733 (2023)

Knowledge distillation is an effective method for model compression with access to training data. However, due to privacy, confidentiality, or transmission limitations, people cannot get the support of data. Existing data-free knowledge distillation

Externí odkaz: https://doaj.org/article/50eee61b84a6498f83db78e5db4c0885

Zobrazit plný text záznamu

Report

Extremely Low Footprint End-to-End ASR System for Smart Device

Autor: Gao, Zhifu, Yao, Yiwu, Zhang, Shiliang, Yang, Jun, Lei, Ming, McLoughlin, Ian

Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate the acoustic, pronunciation and language models into a single neural network, which outperforms conventional models. Among E2E approaches, attention-based models

Externí odkaz: http://arxiv.org/abs/2104.05784

Zobrazit plný text záznamu

Report

INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile Devices

Autor: Yao, Yiwu, Li, Yuchao, Wang, Chengyu, Yu, Tianhang, Chen, Houjiang, Jiang, Xiaotang, Yang, Jun, Huang, Jun, Lin, Wei, Shu, Hui, Lv, Chengfei

The intensive computation of Automatic Speech Recognition (ASR) models obstructs them from being deployed on mobile devices. In this paper, we present a novel quantized Winograd optimization pipeline, which combines the quantization and fast convolut

Externí odkaz: http://arxiv.org/abs/2010.14841

Zobrazit plný text záznamu

Report

Fully Parallel Architecture for Semi-global Stereo Matching with Refined Rank Method

Autor: Yao, Yiwu, Cheng, Yuhua

Fully parallel architecture at disparity-level for efficient semi-global matching (SGM) with refined rank method is presented. The improved SGM algorithm is implemented with the non-parametric unified rank model which is the combination of Rank filte

Externí odkaz: http://arxiv.org/abs/1905.03716

Zobrazit plný text záznamu

Report

Creating Lightweight Object Detectors with Model Compression for Deployment on Edge Devices

Autor: Yao, Yiwu, Yang, Weiqiang, Zhu, Haoqi

To achieve lightweight object detectors for deployment on the edge devices, an effective model compression pipeline is proposed in this paper. The compression pipeline consists of automatic channel pruning for the backbone, fixed channel deletion for

Externí odkaz: http://arxiv.org/abs/1905.01787

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání