Zobrazeno 1 - 10
of 780
pro vyhledávání: '"Kim, Sehoon"'
Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compres
Externí odkaz:
http://arxiv.org/abs/2407.08892
Autor:
Lee, Nicholas, Wattanawong, Thanakul, Kim, Sehoon, Mangalam, Karttikeya, Shen, Sheng, Anumanchipalli, Gopala, Mahoney, Michael W., Keutzer, Kurt, Gholami, Amir
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many
Externí odkaz:
http://arxiv.org/abs/2403.15042
The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs. However, the main performance bottleneck is increas
Externí odkaz:
http://arxiv.org/abs/2403.14123
Autor:
Hooper, Coleman, Kim, Sehoon, Mohammadzadeh, Hiva, Mahoney, Michael W., Shao, Yakun Sophia, Keutzer, Kurt, Gholami, Amir
LLMs are seeing growing use for applications such as document analysis and summarization which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during i
Externí odkaz:
http://arxiv.org/abs/2401.18079
Many applications must provide low-latency LLM service to users or risk unacceptable user experience. However, over-provisioning resources to serve fluctuating request patterns is often prohibitively expensive. In this work, we present a best-effort
Externí odkaz:
http://arxiv.org/abs/2401.07886
Autor:
Kim, Sehoon, Moon, Suhong, Tabrizi, Ryan, Lee, Nicholas, Mahoney, Michael W., Keutzer, Kurt, Gholami, Amir
The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LL
Externí odkaz:
http://arxiv.org/abs/2312.04511
Autor:
Hooper, Coleman, Kim, Sehoon, Mohammadzadeh, Hiva, Genc, Hasan, Keutzer, Kurt, Gholami, Amir, Shao, Sophia
Generative Large Language Models (LLMs) based on the Transformer architecture have recently emerged as a dominant foundation model for a wide range of Natural Language Processing tasks. Nevertheless, their application in real-time scenarios has been
Externí odkaz:
http://arxiv.org/abs/2310.12072
Autor:
Kim, Sehoon, Hooper, Coleman, Gholami, Amir, Dong, Zhen, Li, Xiuyu, Shen, Sheng, Mahoney, Michael W., Keutzer, Kurt
Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced e
Externí odkaz:
http://arxiv.org/abs/2306.07629
Autor:
Kim, Sehoon, Hooper, Coleman, Wattanawong, Thanakul, Kang, Minwoo, Yan, Ruohan, Genc, Hasan, Dinh, Grace, Huang, Qijing, Keutzer, Kurt, Mahoney, Michael W., Shao, Yakun Sophia, Gholami, Amir
Publikováno v:
Presented in Workshop on Architecture and System Support for Transformer Models (ASSYST) at ISCA 2023
Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has been consistent over the past several years since Trans
Externí odkaz:
http://arxiv.org/abs/2302.14017
Autor:
Kim, Sehoon, Mangalam, Karttikeya, Moon, Suhong, Malik, Jitendra, Mahoney, Michael W., Gholami, Amir, Keutzer, Kurt
The recent emergence of Large Language Models based on the Transformer architecture has enabled dramatic advancements in the field of Natural Language Processing. However, these models have long inference latency, which limits their deployment and ma
Externí odkaz:
http://arxiv.org/abs/2302.07863