Výsledky vyhledávání - "Schuster, Assaf"

Report

CompAct: Compressed Activations for Memory-Efficient LLM Training

Autor: Shamshoum, Yara, Hodos, Nitzan, Sieradzki, Yuval, Schuster, Assaf

We introduce CompAct, a technique that reduces peak memory utilization on GPU by 25-30% for pretraining and 50% for fine-tuning of LLMs. Peak device memory is a major limiting factor in training LLMs, with various recent works aiming to reduce model

Externí odkaz: http://arxiv.org/abs/2410.15352

Zobrazit plný text záznamu

Report

DNCs Require More Planning Steps

Autor: Shamshoum, Yara, Hodos, Nitzan, Sieradzki, Yuval, Schuster, Assaf

Many recent works use machine learning models to solve various complex algorithmic problems. However, these models attempt to reach a solution without considering the problem's required computational complexity, which can be detrimental to their abil

Externí odkaz: http://arxiv.org/abs/2406.02187

Zobrazit plný text záznamu

Report

Probabilistic Invariant Learning with Randomized Linear Classifiers

Autor: Cotta, Leonardo, Yehuda, Gal, Schuster, Assaf, Maddison, Chris J.

Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and desi

Externí odkaz: http://arxiv.org/abs/2308.04412

Zobrazit plný text záznamu

Report

FOSI: Hybrid First and Second Order Optimization

Autor: Sivan, Hadar, Gabel, Moshe, Schuster, Assaf

Popular machine learning approaches forgo second-order information due to the difficulty of computing curvature in high dimensions. We present FOSI, a novel meta-algorithm that improves the performance of any base first-order optimizer by efficiently

Externí odkaz: http://arxiv.org/abs/2302.08484

Zobrazit plný text záznamu

Report

Unsupervised Frequent Pattern Mining for CEP

Autor: Shapira, Guy, Schuster, Assaf

Complex Event Processing (CEP) is a set of methods that allow efficient knowledge extraction from massive data streams using complex and highly descriptive patterns. Numerous applications, such as online finance, healthcare monitoring and fraud detec

Externí odkaz: http://arxiv.org/abs/2207.14017

Zobrazit plný text záznamu

Report

Coin Flipping Neural Networks

Autor: Sieradzki, Yuval, Hodos, Nitzan, Yehuda, Gal, Schuster, Assaf

We show that neural networks with access to randomness can outperform deterministic networks by using amplification. We call such networks Coin-Flipping Neural Networks, or CFNNs. We show that a CFNN can approximate the indicator of a $d$-dimensional

Externí odkaz: http://arxiv.org/abs/2206.09182

Zobrazit plný text záznamu

Report

Learning Under Delayed Feedback: Implicitly Adapting to Gradient Delays

Autor: Aviv, Rotem Zamir, Hakimi, Ido, Schuster, Assaf, Levy, Kfir Y.

We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory. We propose a robust training method for the constrained setting and derive non asymptotic convergence guarantees

Externí odkaz: http://arxiv.org/abs/2106.12261

Zobrazit plný text záznamu

Report

It's Not What Machines Can Learn, It's What We Cannot Teach

Autor: Yehuda, Gal, Gabel, Moshe, Schuster, Assaf

Can deep neural networks learn to solve any task, and in particular problems of high complexity? This question attracts a lot of interest, with recent works tackling computationally hard tasks such as the traveling salesman problem and satisfiability

Externí odkaz: http://arxiv.org/abs/2002.09398

Zobrazit plný text záznamu

Report

Adaptive Communication Bounds for Distributed Online Learning

Autor: Kamp, Michael, Boley, Mario, Mock, Michael, Keren, Daniel, Schuster, Assaf, Sharfman, Izchak

Publikováno v: Proceedings of the 7th NIPS Workshop on Optimization for Machine Learning, 2014

We consider distributed online learning protocols that control the exchange of information between local learners in a round-based learning scenario. The learning performance of such a protocol is intuitively optimal if approximately the same loss is

Externí odkaz: http://arxiv.org/abs/1911.12896

Zobrazit plný text záznamu

Report

Gap Aware Mitigation of Gradient Staleness

Autor: Barkai, Saar, Hakimi, Ido, Schuster, Assaf

Cloud computing is becoming increasingly popular as a platform for distributed training of deep neural networks. Synchronous stochastic gradient descent (SSGD) suffers from substantial slowdowns due to stragglers if the environment is non-dedicated,

Externí odkaz: http://arxiv.org/abs/1909.10802

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání