Výsledky vyhledávání - "Cheng, Jianyi"

Report

Scaling Laws for Mixed quantization in Large Language Models

Autor: Cao, Zeyu, Zhang, Cheng, Gimenes, Pedro, Lu, Jianqiao, Cheng, Jianyi, Zhao, Yiren

Post-training quantization of Large Language Models (LLMs) has proven effective in reducing the computational requirements for running inference on these models. In this study, we focus on a straightforward question: When aiming for a specific accura

Externí odkaz: http://arxiv.org/abs/2410.06722

Zobrazit plný text záznamu

Report

HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

Autor: Yu, Zhewen, Sreeram, Sudarshan, Agrawal, Krish, Wu, Junyi, Montgomerie-Corcoran, Alexander, Zhang, Cheng, Cheng, Jianyi, Bouganis, Christos-Savvas, Zhao, Yiren

Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hard

Externí odkaz: http://arxiv.org/abs/2406.03088

Zobrazit plný text záznamu

Report

LQER: Low-Rank Quantization Error Reconstruction for LLMs

Autor: Zhang, Cheng, Cheng, Jianyi, Constantinides, George A., Zhao, Yiren

Post-training quantization of Large Language Models (LLMs) is challenging. In this work, we introduce Low-rank Quantization Error Reduction (LQER), which combines quantization and low-rank approximation to recover the model capability. LQER leverages

Externí odkaz: http://arxiv.org/abs/2402.02446

Zobrazit plný text záznamu

Report

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

Autor: Zhang, Cheng, Cheng, Jianyi, Shumailov, Ilia, Constantinides, George A., Zhao, Yiren

The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work, we explore

Externí odkaz: http://arxiv.org/abs/2310.05079

Zobrazit plný text záznamu

Report

GSA to HDL: Towards principled generation of dynamically scheduled circuits

Autor: Rajagopal, Aditya, Vink, Diederik Adriaan, Cheng, Jianyi, Herklotz, Yann

High-level synthesis (HLS) refers to the automatic translation of a software program written in a high-level language into a hardware design. Modern HLS tools have moved away from the traditional approach of static (compile time) scheduling of operat

Externí odkaz: http://arxiv.org/abs/2308.11048

Zobrazit plný text záznamu

Report

SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with MLIR

Autor: Cheng, Jianyi, Coward, Samuel, Chelini, Lorenzo, Barbalho, Rafael, Drane, Theo

High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description. However, the hardware designs produced by HLS tools still suffer from a significant performance g

Externí odkaz: http://arxiv.org/abs/2308.07654

Zobrazit plný text záznamu

Report

A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats

Autor: Cheng, Jianyi, Zhang, Cheng, Yu, Zhewen, Bouganis, Christos-Savvas, Constantinides, George A., Zhao, Yiren

Model quantization represents both parameters (weights) and intermediate values (activations) in a more compact format, thereby directly reducing both computational and memory cost in hardware. The quantization of recent large language models (LLMs)

Externí odkaz: http://arxiv.org/abs/2307.15517

Zobrazit plný text záznamu

Report

PASS: Exploiting Post-Activation Sparsity in Streaming Architectures for CNN Acceleration

Autor: Montgomerie-Corcoran, Alexander, Yu, Zhewen, Cheng, Jianyi, Bouganis, Christos-Savvas

With the ever-growing popularity of Artificial Intelligence, there is an increasing demand for more performant and efficient underlying hardware. Convolutional Neural Networks (CNN) are a workload of particular importance, which achieve high accuracy

Externí odkaz: http://arxiv.org/abs/2307.07821

Zobrazit plný text záznamu

Report

ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation

Autor: Ye, Hanchen, Hao, Cong, Cheng, Jianyi, Jeong, Hyunmin, Huang, Jack, Neuendorffer, Stephen, Chen, Deming

High-level synthesis (HLS) has been widely adopted as it significantly improves the hardware design productivity and enables efficient design space exploration (DSE). Existing HLS tools are built using compiler infrastructures largely based on a sing

Externí odkaz: http://arxiv.org/abs/2107.11673

Zobrazit plný text záznamu

Report

Phism: Polyhedral High-Level Synthesis in MLIR

Autor: Zhao, Ruizhe, Cheng, Jianyi

Polyhedral optimisation, a methodology that views nested loops as polyhedra and searches for their optimal transformation regarding specific objectives (parallelism, locality, etc.), sounds promising for mitigating difficulties in automatically optim

Externí odkaz: http://arxiv.org/abs/2103.15103

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání