Zobrazeno 1 - 10
of 165
pro vyhledávání: '"He Yuxiong"'
Publikováno v:
E3S Web of Conferences, Vol 522, p 01048 (2024)
To enhance the network security of distributed power dispatching and control systems in the context of the “30·60” dual carbon goals, this study focuses on typical architectures for business networking of distributed power interface devices. We
Externí odkaz:
https://doaj.org/article/8b7c2c52703c4f4ebb07c90bfecfc1f9
LLM inference for popular enterprise use cases, such as summarization, RAG, and code-generation, typically observes orders of magnitude longer prompt lengths than generation lengths. This characteristic leads to high cost of prefill and increased res
Externí odkaz:
http://arxiv.org/abs/2410.03960
Mixture-of-experts (MoEs) have been adopted for reducing inference costs by sparsely activating experts in Large language models (LLMs). Despite this reduction, the massive number of experts in MoEs still makes them expensive to serve. In this paper,
Externí odkaz:
http://arxiv.org/abs/2409.06211
Model checkpoints are critical Deep Learning (DL) artifacts that enable fault tolerance for training and downstream applications, such as inference. However, writing checkpoints to persistent storage, and other I/O aspects of DL training, are mostly
Externí odkaz:
http://arxiv.org/abs/2406.13768
Autor:
Xia, Haojun, Zheng, Zhen, Wu, Xiaoxia, Chen, Shiyang, Yao, Zhewei, Youn, Stephen, Bakhtiari, Arash, Wyatt, Michael, Zhuang, Donglin, Zhou, Zhongzhu, Ruwase, Olatunji, He, Yuxiong, Song, Shuaiwen Leon
Six-bit quantization (FP6) can effectively reduce the size of large language models (LLMs) and preserve the model quality consistently across varied applications. However, existing systems do not provide Tensor Core support for FP6 quantization and s
Externí odkaz:
http://arxiv.org/abs/2401.14112
Autor:
Wu, Xiaoxia, Xia, Haojun, Youn, Stephen, Zheng, Zhen, Chen, Shiyang, Bakhtiari, Arash, Wyatt, Michael, Aminabadi, Reza Yazdani, He, Yuxiong, Ruwase, Olatunji, Song, Leon, Yao, Zhewei
This study examines 4-bit quantization methods like GPTQ in large language models (LLMs), highlighting GPTQ's overfitting and limited enhancement in Zero-Shot tasks. While prior works merely focusing on zero-shot measurement, we extend task scope to
Externí odkaz:
http://arxiv.org/abs/2312.08583
Quantization techniques are pivotal in reducing the memory and computational demands of deep neural network inference. Existing solutions, such as ZeroQuant, offer dynamic quantization for models like BERT and GPT but overlook crucial memory-bounded
Externí odkaz:
http://arxiv.org/abs/2310.17723
Autor:
Song, Shuaiwen Leon, Kruft, Bonnie, Zhang, Minjia, Li, Conglong, Chen, Shiyang, Zhang, Chengming, Tanaka, Masahiro, Wu, Xiaoxia, Rasley, Jeff, Awan, Ammar Ahmad, Holmes, Connor, Cai, Martin, Ghanem, Adam, Zhou, Zhongzhu, He, Yuxiong, Luferenko, Pete, Kumar, Divya, Weyn, Jonathan, Zhang, Ruixiong, Klocek, Sylwester, Vragov, Volodymyr, AlQuraishi, Mohammed, Ahdritz, Gustaf, Floristean, Christina, Negri, Cristina, Kotamarthi, Rao, Vishwanath, Venkatram, Ramanathan, Arvind, Foreman, Sam, Hippe, Kyle, Arcomano, Troy, Maulik, Romit, Zvyagin, Maxim, Brace, Alexander, Zhang, Bin, Bohorquez, Cindy Orozco, Clyde, Austin, Kale, Bharat, Perez-Rivera, Danilo, Ma, Heng, Mann, Carla M., Irvin, Michael, Pauloski, J. Gregory, Ward, Logan, Hayot, Valerie, Emani, Murali, Xie, Zhen, Lin, Diangen, Shukla, Maulik, Foster, Ian, Davis, James J., Papka, Michael E., Brettin, Thomas, Balaprakash, Prasanna, Tourassi, Gina, Gounley, John, Hanson, Heidi, Potok, Thomas E, Pasini, Massimiliano Lupo, Evans, Kate, Lu, Dan, Lunga, Dalton, Yin, Junqi, Dash, Sajal, Wang, Feiyi, Shankar, Mallikarjun, Lyngaas, Isaac, Wang, Xiao, Cong, Guojing, Zhang, Pei, Fan, Ming, Liu, Siyan, Hoisie, Adolfy, Yoo, Shinjae, Ren, Yihui, Tang, William, Felker, Kyle, Svyatkovskiy, Alexey, Liu, Hang, Aji, Ashwin, Dalton, Angela, Schulte, Michael, Schulz, Karl, Deng, Yuntian, Nie, Weili, Romero, Josh, Dallago, Christian, Vahdat, Arash, Xiao, Chaowei, Gibbs, Thomas, Anandkumar, Anima, Stevens, Rick
In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors fro
Externí odkaz:
http://arxiv.org/abs/2310.04610
Autor:
Jacobs, Sam Ade, Tanaka, Masahiro, Zhang, Chengming, Zhang, Minjia, Song, Shuaiwen Leon, Rajbhandari, Samyam, He, Yuxiong
Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the first three d
Externí odkaz:
http://arxiv.org/abs/2309.14509
Autor:
Yao, Zhewei, Wu, Xiaoxia, Li, Conglong, Zhang, Minjia, Qin, Heyang, Ruwase, Olatunji, Awan, Ammar Ahmad, Rajbhandari, Samyam, He, Yuxiong
Most of the existing multi-modal models, hindered by their incapacity to adeptly manage interleaved image-and-text inputs in multi-image, multi-round dialogues, face substantial constraints in resource allocation for training and data accessibility,
Externí odkaz:
http://arxiv.org/abs/2309.14327