Zobrazeno 1 - 10
of 44
pro vyhledávání: '"Abduljabbar, Mustafa"'
Autor:
Anthony, Quentin, Michalowicz, Benjamin, Hatef, Jacob, Xu, Lang, Abduljabbar, Mustafa, Shafi, Aamir, Subramoni, Hari, Panda, Dhabaleswar
Deep learning (DL) models based on the transformer architecture have revolutionized many DL applications such as large language models (LLMs), vision transformers, audio generation, and time series prediction. Much of this progress has been fueled by
Externí odkaz:
http://arxiv.org/abs/2408.10197
Autor:
Abduljabbar, Mustafa
We present algorithms and implementations that overcome obstacles in the migration of the Fast Multipole Method (FMM), one of the most important algorithms in computational science and engineering, to exascale computing. Emerging architectural approa
Externí odkaz:
http://hdl.handle.net/10754/630221
Autor:
Hultgren, Tova, Abduljabbar, Mustafa
Digitaliseringen är en modern aspekt som gett upphov till ett ökat hemarbete för många företag. Det som däremot skyndat på distansarbetet och hemmakontoret, är den omfattande och drastiska påverkan som covid-19 pandemin hade på omvärlden.
Externí odkaz:
http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-63877
Autor:
Anthony, Quentin, Awan, Ammar Ahmad, Rasley, Jeff, He, Yuxiong, Shafi, Aamir, Abduljabbar, Mustafa, Subramoni, Hari, Panda, Dhabaleswar
In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massive models n
Externí odkaz:
http://arxiv.org/abs/2303.08374
Autor:
Ahn, Hyunho, Chen, Tian, Alnaasan, Nawras, Shafi, Aamir, Abduljabbar, Mustafa, Subramoni, Hari, K., Dhabaleswar, Panda
Quantization is a popular technique used in Deep Neural Networks (DNN) inference to reduce the size of models and improve the overall numerical performance by exploiting native hardware. This paper attempts to conduct an elaborate performance charact
Externí odkaz:
http://arxiv.org/abs/2303.05016
Chiplets have become a common methodology in modern chip design. Chiplets improve yield and enable heterogeneity at the level of cores, memory subsystem and the interconnect. Convolutional Neural Networks (CNNs) have high computational, bandwidth and
Externí odkaz:
http://arxiv.org/abs/2202.11575
Parallel applications often rely on work stealing schedulers in combination with fine-grained tasking to achieve high performance and scalability. However, reducing the total energy consumption in the context of work stealing runtimes is still challe
Externí odkaz:
http://arxiv.org/abs/2201.12186
Efficient runtime task scheduling on complex memory hierarchy becomes increasingly important as modern and future High-Performance Computing (HPC) systems are progressively composed of multisocket and multi-chiplet nodes with nonuniform memory access
Externí odkaz:
http://arxiv.org/abs/2112.09509
Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this work, we stud
Externí odkaz:
http://arxiv.org/abs/2009.00915
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.