Zobrazeno 1 - 10
of 285
pro vyhledávání: '"Schuster, Assaf"'
We introduce CompAct, a technique that reduces peak memory utilization on GPU by 25-30% for pretraining and 50% for fine-tuning of LLMs. Peak device memory is a major limiting factor in training LLMs, with various recent works aiming to reduce model
Externí odkaz:
http://arxiv.org/abs/2410.15352
Many recent works use machine learning models to solve various complex algorithmic problems. However, these models attempt to reach a solution without considering the problem's required computational complexity, which can be detrimental to their abil
Externí odkaz:
http://arxiv.org/abs/2406.02187
Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and desi
Externí odkaz:
http://arxiv.org/abs/2308.04412
Popular machine learning approaches forgo second-order information due to the difficulty of computing curvature in high dimensions. We present FOSI, a novel meta-algorithm that improves the performance of any base first-order optimizer by efficiently
Externí odkaz:
http://arxiv.org/abs/2302.08484
Autor:
Shapira, Guy, Schuster, Assaf
Complex Event Processing (CEP) is a set of methods that allow efficient knowledge extraction from massive data streams using complex and highly descriptive patterns. Numerous applications, such as online finance, healthcare monitoring and fraud detec
Externí odkaz:
http://arxiv.org/abs/2207.14017
We show that neural networks with access to randomness can outperform deterministic networks by using amplification. We call such networks Coin-Flipping Neural Networks, or CFNNs. We show that a CFNN can approximate the indicator of a $d$-dimensional
Externí odkaz:
http://arxiv.org/abs/2206.09182
We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory. We propose a robust training method for the constrained setting and derive non asymptotic convergence guarantees
Externí odkaz:
http://arxiv.org/abs/2106.12261
Can deep neural networks learn to solve any task, and in particular problems of high complexity? This question attracts a lot of interest, with recent works tackling computationally hard tasks such as the traveling salesman problem and satisfiability
Externí odkaz:
http://arxiv.org/abs/2002.09398
Publikováno v:
Proceedings of the 7th NIPS Workshop on Optimization for Machine Learning, 2014
We consider distributed online learning protocols that control the exchange of information between local learners in a round-based learning scenario. The learning performance of such a protocol is intuitively optimal if approximately the same loss is
Externí odkaz:
http://arxiv.org/abs/1911.12896
Cloud computing is becoming increasingly popular as a platform for distributed training of deep neural networks. Synchronous stochastic gradient descent (SSGD) suffers from substantial slowdowns due to stragglers if the environment is non-dedicated,
Externí odkaz:
http://arxiv.org/abs/1909.10802