Zobrazeno 1 - 10
of 287
pro vyhledávání: '"Mitra, Tulika"'
Autor:
Dangi, Pranav, Bandara, Thilini Kaushalya, Sheikhpour, Saeideh, Mitra, Tulika, Eeckhout, Lieven
Hardware specialization is commonly viewed as a way to scale performance in the dark silicon era with modern-day SoCs featuring multiple tens of dedicated accelerators. By only powering on hardware circuitry when needed, accelerators fundamentally tr
Externí odkaz:
http://arxiv.org/abs/2411.09315
Autor:
Binici, Kuluhan, Aggarwal, Shivam, Acar, Cihan, Pham, Nam Trung, Leman, Karianto, Lee, Gim Hee, Mitra, Tulika
Knowledge distillation (KD) is a key element in neural network compression that allows knowledge transfer from a pre-trained teacher model to a more compact student model. KD relies on access to the training dataset, which may not always be fully ava
Externí odkaz:
http://arxiv.org/abs/2408.13850
Knowledge distillation (KD) is a model compression method that entails training a compact student model to emulate the performance of a more complex teacher model. However, the architectural capacity gap between the two models limits the effectivenes
Externí odkaz:
http://arxiv.org/abs/2407.16040
Efficiently supporting long context length is crucial for Transformer models. The quadratic complexity of the self-attention computation plagues traditional Transformers. Sliding window-based static sparse attention mitigates the problem by limiting
Externí odkaz:
http://arxiv.org/abs/2405.17025
Heart disease is one of the leading causes of death worldwide. Given its high risk and often asymptomatic nature, real-time continuous monitoring is essential. Unlike traditional artificial neural networks (ANNs), spiking neural networks (SNNs) are w
Externí odkaz:
http://arxiv.org/abs/2406.06543
Machine learning pipelines for classification tasks often train a universal model to achieve accuracy across a broad range of classes. However, a typical user encounters only a limited selection of classes regularly. This disparity provides an opport
Externí odkaz:
http://arxiv.org/abs/2311.14272
Autor:
Aggarwal, Shivam, Damsgaard, Hans Jakob, Pappalardo, Alessandro, Franco, Giuseppe, Preußer, Thomas B., Blott, Michaela, Mitra, Tulika
Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point formats(FP8) in the
Externí odkaz:
http://arxiv.org/abs/2311.12359
Autor:
Li, Huize, Mitra, Tulika
Sparse matrix-matrix multiplication (SpGEMM) is a critical kernel widely employed in machine learning and graph algorithms. However, real-world matrices' high sparsity makes SpGEMM memory-intensive. In-situ computing offers the potential to accelerat
Externí odkaz:
http://arxiv.org/abs/2311.03826
Classic Graph Neural Network (GNN) inference approaches, designed for static graphs, are ill-suited for streaming graphs that evolve with time. The dynamism intrinsic to streaming graphs necessitates constant updates, posing unique challenges to acce
Externí odkaz:
http://arxiv.org/abs/2309.11071
Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route
Externí odkaz:
http://arxiv.org/abs/2309.10623