Zobrazeno 1 - 10
of 517
pro vyhledávání: '"van Baalen, P."'
Active colloidal particles typically exhibit a pronounced affinity for accumulating and being captured at boundaries. Here, we engineer long-range repulsive interactions between colloids that self-propel under an electric field and patterned obstacle
Externí odkaz:
http://arxiv.org/abs/2501.00660
Active systems comprising micron-sized self-propelling units, also termed microswimmers, are promising candidates for the bottom-up assembly of small structures and reconfigurable materials. Here we leverage field-driven colloidal assembly to induce
Externí odkaz:
http://arxiv.org/abs/2412.16658
Autor:
Federici, Marco, Belli, Davide, van Baalen, Mart, Jalalirad, Amir, Skliar, Andrii, Major, Bence, Nagel, Markus, Whatmough, Paul
While mobile devices provide ever more compute power, improvements in DRAM bandwidth are much slower. This is unfortunate for large language model (LLM) token generation, which is heavily memory-bound. Previous work has proposed to leverage natural d
Externí odkaz:
http://arxiv.org/abs/2412.01380
Autor:
Skliar, Andrii, van Rozendaal, Ties, Lepert, Romain, Boinovski, Todor, van Baalen, Mart, Nagel, Markus, Whatmough, Paul, Bejnordi, Babak Ehteshami
Mixture of Experts (MoE) LLMs have recently gained attention for their ability to enhance performance by selectively engaging specialized subnetworks or "experts" for each input. However, deploying MoEs on memory-constrained devices remains challengi
Externí odkaz:
http://arxiv.org/abs/2412.00099
Autor:
Bhardwaj, Kartikeya, Pandey, Nilesh Prasad, Priyadarshi, Sweta, Ganapathy, Viswanath, Esteves, Rafael, Kadambi, Shreya, Borse, Shubhankar, Whatmough, Paul, Garrepalli, Risheek, Van Baalen, Mart, Teague, Harris, Nagel, Markus
In this paper, we propose Sparse High Rank Adapters (SHiRA) that directly finetune 1-2% of the base model weights while leaving others unchanged, thus, resulting in a highly sparse adapter. This high sparsity incurs no inference overhead, enables rap
Externí odkaz:
http://arxiv.org/abs/2407.16712
Autor:
Bhardwaj, Kartikeya, Pandey, Nilesh Prasad, Priyadarshi, Sweta, Ganapathy, Viswanath, Esteves, Rafael, Kadambi, Shreya, Borse, Shubhankar, Whatmough, Paul, Garrepalli, Risheek, Van Baalen, Mart, Teague, Harris, Nagel, Markus
Low Rank Adaptation (LoRA) has gained massive attention in the recent generative AI research. One of the main advantages of LoRA is its ability to be fused with pretrained models adding no overhead during inference. However, from a mobile deployment
Externí odkaz:
http://arxiv.org/abs/2406.13175
Autor:
van Baalen, Mart, Kuzmin, Andrey, Nagel, Markus, Couperus, Peter, Bastoul, Cedric, Mahurin, Eric, Blankevoort, Tijmen, Whatmough, Paul
In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality. We propose the GPTVQ method, a new fast method for post-training vector quantizat
Externí odkaz:
http://arxiv.org/abs/2402.15319
Autor:
van der Ouderaa, Tycho F. A., Nagel, Markus, van Baalen, Mart, Asano, Yuki M., Blankevoort, Tijmen
State-of-the-art language models are becoming increasingly large in an effort to achieve the highest performance on large corpora of available textual data. However, the sheer size of the Transformer architectures makes it difficult to deploy models
Externí odkaz:
http://arxiv.org/abs/2312.17244
Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths, achieve b
Externí odkaz:
http://arxiv.org/abs/2307.04535
Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neu
Externí odkaz:
http://arxiv.org/abs/2307.02973