Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Bhagat, Aaryan"'
Autor:
Kaushal, Ayush, Vaidhya, Tejas, Mondal, Arnab Kumar, Pandey, Tejas, Bhagat, Aaryan, Rish, Irina
Rapid advancements in GPU computational power has outpaced memory capacity and bandwidth growth, creating bottlenecks in Large Language Model (LLM) inference. Post-training quantization is the leading method for addressing memory-related bottlenecks
Externí odkaz:
http://arxiv.org/abs/2407.12327