Výsledky vyhledávání - "Tyukin, Georgy"

Report

Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models

Autor: Tyukin, Georgy, Dovonon, Gbetondji J-S, Kaddour, Jean, Minervini, Pasquale

The inference demand for LLMs has skyrocketed in recent months, and serving models with low latencies remains challenging due to the quadratic input length complexity of the attention layers. In this work, we investigate the effect of dropping MLP an

Externí odkaz: http://arxiv.org/abs/2407.15516

Zobrazit plný text záznamu

Report

Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations

Autor: Tyukin, Georgy

Large Language Models are growing in size, and we expect them to continue to do so, as larger models train quicker. However, this increase in size will severely impact inference costs. Therefore model compression is important, to retain the performan

Externí odkaz: http://arxiv.org/abs/2404.05741

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání