Zobrazeno 1 - 10
of 45
pro vyhledávání: '"Cheng, Jianyi"'
Post-training quantization of Large Language Models (LLMs) has proven effective in reducing the computational requirements for running inference on these models. In this study, we focus on a straightforward question: When aiming for a specific accura
Externí odkaz:
http://arxiv.org/abs/2410.06722
Autor:
Yu, Zhewen, Sreeram, Sudarshan, Agrawal, Krish, Wu, Junyi, Montgomerie-Corcoran, Alexander, Zhang, Cheng, Cheng, Jianyi, Bouganis, Christos-Savvas, Zhao, Yiren
Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hard
Externí odkaz:
http://arxiv.org/abs/2406.03088
Post-training quantization of Large Language Models (LLMs) is challenging. In this work, we introduce Low-rank Quantization Error Reduction (LQER), which combines quantization and low-rank approximation to recover the model capability. LQER leverages
Externí odkaz:
http://arxiv.org/abs/2402.02446
The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work, we explore
Externí odkaz:
http://arxiv.org/abs/2310.05079
High-level synthesis (HLS) refers to the automatic translation of a software program written in a high-level language into a hardware design. Modern HLS tools have moved away from the traditional approach of static (compile time) scheduling of operat
Externí odkaz:
http://arxiv.org/abs/2308.11048
High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description. However, the hardware designs produced by HLS tools still suffer from a significant performance g
Externí odkaz:
http://arxiv.org/abs/2308.07654
Autor:
Cheng, Jianyi, Zhang, Cheng, Yu, Zhewen, Bouganis, Christos-Savvas, Constantinides, George A., Zhao, Yiren
Model quantization represents both parameters (weights) and intermediate values (activations) in a more compact format, thereby directly reducing both computational and memory cost in hardware. The quantization of recent large language models (LLMs)
Externí odkaz:
http://arxiv.org/abs/2307.15517
With the ever-growing popularity of Artificial Intelligence, there is an increasing demand for more performant and efficient underlying hardware. Convolutional Neural Networks (CNN) are a workload of particular importance, which achieve high accuracy
Externí odkaz:
http://arxiv.org/abs/2307.07821
Autor:
Ye, Hanchen, Hao, Cong, Cheng, Jianyi, Jeong, Hyunmin, Huang, Jack, Neuendorffer, Stephen, Chen, Deming
High-level synthesis (HLS) has been widely adopted as it significantly improves the hardware design productivity and enables efficient design space exploration (DSE). Existing HLS tools are built using compiler infrastructures largely based on a sing
Externí odkaz:
http://arxiv.org/abs/2107.11673
Autor:
Zhao, Ruizhe, Cheng, Jianyi
Polyhedral optimisation, a methodology that views nested loops as polyhedra and searches for their optimal transformation regarding specific objectives (parallelism, locality, etc.), sounds promising for mitigating difficulties in automatically optim
Externí odkaz:
http://arxiv.org/abs/2103.15103