Model Acceleration for Efficient Deep Learning Computing
Autor: | Cai, Han |
---|---|
Rok vydání: | 2024 |
Druh dokumentu: | Diplomová práce |
Popis: | Large foundation models play a central role in achieving recent fundamental breakthroughs in artificial intelligence. By simultaneously scaling up the dataset and model size to an unprecedented level, these foundation models demonstrate remarkable performances in many areas such as protein structure prediction, image/video synthesis, code generation, ChatBot, etc. However, their computation and memory costs grow dramatically. It makes deploying these foundation models on real-world applications difficult, especially for resource-constrained edge devices. In addition, their prohibitive training cost also significantly hinders the development of new foundation models and raises concerns about the enormous energy consumption and CO2 emission. To address these concerns, building effective model acceleration techniques is critical to closing the gap between supply and demand for computing. This thesis will cover three important aspects of model acceleration. First, we will discuss efficient representation learning, including EfficientViT (a new vision transformer architecture) for high-resolution vision and condition-aware neural networks (a new control module) for conditional image generation. Second, we will present hardware-aware acceleration techniques to create specialized neural networks for different hardware platforms and efficiency constraints. Third, we will introduce TinyTL, a memory-efficient transfer learning technique to enable on-device model customization. Through our design, we can significantly boost deep neural networks' efficiency on hardware without losing accuracy, making them more accessible and reducing their serving cost. For example, our model delivers 48.9x higher throughput on A100 GPU while achieving slightly better zero-shot instance segmentation performance than the state-of-the-art model. For conditional image generation, our approach achieves 52x computational cost reduction without performance degradation. Ph.D. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |