Výsledky vyhledávání

Report

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Autor: Tang, Jiaming, Zhao, Yilong, Zhu, Kan, Xiao, Guangxuan, Kasikci, Baris, Han, Song

As the demand for long-context large language models (LLMs) increases, models with context windows of up to 128K or 1M tokens are becoming increasingly prevalent. However, long-context LLM inference is challenging since the inference speed decreases

Externí odkaz: http://arxiv.org/abs/2406.10774

Zobrazit plný text záznamu

Report

X-VILA: Cross-Modality Alignment for Large Language Model

Autor: Ye, Hanrong, Huang, De-An, Lu, Yao, Yu, Zhiding, Ping, Wei, Tao, Andrew, Kautz, Jan, Han, Song, Xu, Dan, Molchanov, Pavlo, Yin, Hongxu

We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities. By aligning modality-specific encoders with LLM inputs and diffusion decoders with LL

Externí odkaz: http://arxiv.org/abs/2405.19335

Zobrazit plný text záznamu

Report

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Autor: Lin, Yujun, Tang, Haotian, Yang, Shang, Zhang, Zhekai, Xiao, Guangxuan, Gan, Chuang, Han, Song

Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. Nonetheless, state-of-the-art INT4 quantization techniques only acceler

Externí odkaz: http://arxiv.org/abs/2405.04532

Zobrazit plný text záznamu

Report

A Survey on Industrial Internet of Things (IIoT) Testbeds for Connectivity Research

Autor: Zhang, Tianyu, Xue, Chuanyu, Wang, Jiachen, Yun, Zelin, Lin, Natong, Han, Song

Industrial Internet of Things (IIoT) technologies have revolutionized industrial processes, enabling smart automation, real-time data analytics, and improved operational efficiency across diverse industry sectors. IIoT testbeds play a critical role i

Externí odkaz: http://arxiv.org/abs/2404.17485

Zobrazit plný text záznamu

Report

Condition-Aware Neural Network for Controlled Image Generation

Autor: Cai, Han, Li, Muyang, Zhang, Zhuoyang, Zhang, Qinsheng, Liu, Ming-Yu, Han, Song

We present Condition-Aware Neural Network (CAN), a new method for adding control to image generative models. In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neu

Externí odkaz: http://arxiv.org/abs/2404.01143

Zobrazit plný text záznamu

Report

Tiny Machine Learning: Progress and Futures

Autor: Lin, Ji, Zhu, Ligeng, Chen, Wei-Ming, Wang, Wei-Chen, Han, Song

Publikováno v: IEEE Circuits and Systems Magazine, 23(3), pp. 8-34, October 2023

Tiny Machine Learning (TinyML) is a new frontier of machine learning. By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence. However, Tiny

Externí odkaz: http://arxiv.org/abs/2403.19076

Zobrazit plný text záznamu

Report

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Autor: Li, Muyang, Cai, Tianle, Cao, Jiaxin, Zhang, Qinsheng, Cai, Han, Bai, Junjie, Jia, Yangqing, Liu, Ming-Yu, Li, Kai, Han, Song

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for in

Externí odkaz: http://arxiv.org/abs/2402.19481

Zobrazit plný text záznamu

Report

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Autor: Liu, James, Xiao, Guangxuan, Li, Kai, Lee, Jason D., Han, Song, Dao, Tri, Cai, Tianle

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that fine-tuning ad

Externí odkaz: http://arxiv.org/abs/2402.10193

Zobrazit plný text záznamu

Report

EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss

Autor: Zhang, Zhuoyang, Cai, Han, Han, Song

We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge dis

Externí odkaz: http://arxiv.org/abs/2402.05008

Zobrazit plný text záznamu

Report

Qplacer: Frequency-Aware Component Placement for Superconducting Quantum Computers

Autor: Zhang, Junyao, Wang, Hanrui, Ding, Qi, Gu, Jiaqi, Assouly, Reouven, Oliver, William D., Han, Song, Brown, Kenneth R., Li, Hai "Helen", Chen, Yiran

Noisy Intermediate-Scale Quantum (NISQ) computers face a critical limitation in qubit numbers, hindering their progression towards large-scale and fault-tolerant quantum computing. A significant challenge impeding scaling is crosstalk, characterized

Externí odkaz: http://arxiv.org/abs/2401.17450

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání