Zobrazeno 1 - 10
of 514
pro vyhledávání: '"Cao, Shiyi"'
Online LLM inference powers many exciting applications such as intelligent chatbots and autonomous agents. Modern LLM inference engines widely rely on request batching to improve inference throughput, aiming to make it cost-efficient when running on
Externí odkaz:
http://arxiv.org/abs/2411.01142
This paper presents techniques for theoretically and practically efficient and scalable Schr\"odinger-style quantum circuit simulation. Our approach partitions a quantum circuit into a hierarchy of subcircuits and simulates the subcircuits on multi-n
Externí odkaz:
http://arxiv.org/abs/2408.09055
Autor:
Jeon, Byungsoo, Wu, Mengdi, Cao, Shiyi, Kim, Sunghyun, Park, Sunghyun, Aggarwal, Neeraj, Unger, Colin, Arfeen, Daiyaan, Liao, Peiyuan, Miao, Xupeng, Alizadeh, Mohammad, Ganger, Gregory R., Chen, Tianqi, Jia, Zhihao
Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple st
Externí odkaz:
http://arxiv.org/abs/2406.17145
Autor:
Yang, Ling, Yu, Zhaochen, Zhang, Tianjun, Cao, Shiyi, Xu, Minkai, Zhang, Wentao, Gonzalez, Joseph E., Cui, Bin
We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative
Externí odkaz:
http://arxiv.org/abs/2406.04271
Autor:
Liu, Shu, Biswal, Asim, Cheng, Audrey, Mo, Xiangxi, Cao, Shiyi, Gonzalez, Joseph E., Stoica, Ion, Zaharia, Matei
Analytical database providers (e.g., Redshift, Databricks, BigQuery) have rapidly added support for invoking Large Language Models (LLMs) through native user-defined functions (UDFs) to help users perform natural language tasks, such as classificatio
Externí odkaz:
http://arxiv.org/abs/2403.05821
Autor:
Sheng, Ying, Cao, Shiyi, Li, Dacheng, Zhu, Banghua, Li, Zhuohan, Zhuo, Danyang, Gonzalez, Joseph E., Stoica, Ion
High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have reque
Externí odkaz:
http://arxiv.org/abs/2401.00588
Autor:
Zheng, Lianmin, Yin, Liangsheng, Xie, Zhiqiang, Sun, Chuyue, Huang, Jeff, Yu, Cody Hao, Cao, Shiyi, Kozyrakis, Christos, Stoica, Ion, Gonzalez, Joseph E., Barrett, Clark, Sheng, Ying
Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming and execut
Externí odkaz:
http://arxiv.org/abs/2312.07104
Autor:
Sheng, Ying, Cao, Shiyi, Li, Dacheng, Hooper, Coleman, Lee, Nicholas, Yang, Shuo, Chou, Christopher, Zhu, Banghua, Zheng, Lianmin, Keutzer, Kurt, Gonzalez, Joseph E., Stoica, Ion
The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in
Externí odkaz:
http://arxiv.org/abs/2311.03285
Autor:
Gao, Tianyu, Wang, Xutao, Wang, Yanze, Xu, Tao, Yu, Wenbo, Liu, Yaping, Yang, Zhiqun, Huang, Zhanhua, Guo, Qiang, Zhou, Rui, Cao, Shiyi, Xiao, Xinhua, Huang, Qiushi, Sun, Wei, Yan, Min, Liu, Zhenhua, Zhang, Xianyu, Zhang, Lin
Publikováno v:
In Optics Communications 1 November 2024 570
Autor:
Lou, Yiling, Jiang, Qingqing, Huang, Shen, Xie, Yulin, Wang, Hengchang, Wang, Linlin, Wang, Shiqi, Xu, Minzhi, Lu, Zuxun, Wang, Furong, Cao, Shiyi
Publikováno v:
In Journal of Affective Disorders 1 January 2025 368:789-797