Zobrazeno 1 - 10
of 32
pro vyhledávání: '"Moroshko, Edward"'
Autor:
Evron, Itay, Moroshko, Edward, Buzaglo, Gon, Khriesh, Maroun, Marjieh, Badea, Srebro, Nathan, Soudry, Daniel
We analyze continual learning on a sequence of separable linear classification tasks with binary labels. We show theoretically that learning with weak regularization reduces to solving a sequential max-margin problem, corresponding to a special case
Externí odkaz:
http://arxiv.org/abs/2306.03534
Publikováno v:
35th Annual Conference on Learning Theory (2022)
To better understand catastrophic forgetting, we study fitting an overparameterized linear model to a sequence of tasks with different input distributions. We analyze how much the model forgets the true labels of earlier tasks after training on subse
Externí odkaz:
http://arxiv.org/abs/2205.09588
Autor:
Azulay, Shahar, Moroshko, Edward, Nacson, Mor Shpigel, Woodworth, Blake, Srebro, Nathan, Globerson, Amir, Soudry, Daniel
Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to. In particular, it was shown that large initialization leads to the neural tangent kernel regime solution, wh
Externí odkaz:
http://arxiv.org/abs/2102.09769
Autor:
Moroshko, Edward, Gunasekar, Suriya, Woodworth, Blake, Lee, Jason D., Srebro, Nathan, Soudry, Daniel
We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks". This is the simplest model displaying a transition between "kernel" and no
Externí odkaz:
http://arxiv.org/abs/2007.06738
Autor:
Woodworth, Blake, Gunasekar, Suriya, Lee, Jason D., Moroshko, Edward, Savarese, Pedro, Golan, Itay, Soudry, Daniel, Srebro, Nathan
A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS
Externí odkaz:
http://arxiv.org/abs/2002.09277
Autor:
Woodworth, Blake, Gunasekar, Suriya, Savarese, Pedro, Moroshko, Edward, Golan, Itay, Lee, Jason, Soudry, Daniel, Srebro, Nathan
A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS
Externí odkaz:
http://arxiv.org/abs/1906.05827
Publikováno v:
NeurIPS 2022
We consider the dynamic linear regression problem, where the predictor vector may vary with time. This problem can be modeled as a linear dynamical system, with non-constant observation operator, where the parameters that need to be learned are the v
Externí odkaz:
http://arxiv.org/abs/1906.05591
We suggest a new idea of Editorial Network - a mixed extractive-abstractive summarization approach, which is applied as a post-processing step over a given sequence of extracted sentences. Our network tries to imitate the decision process of a human
Externí odkaz:
http://arxiv.org/abs/1902.10360
Autor:
Kozdoba, Mark, Moroshko, Edward, Shani, Lior, Takagi, Takuya, Katoh, Takashi, Mannor, Shie, Crammer, Koby
In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective. We show that when the data is unbalanced and the family of classifiers is sufficiently rich, the SI method is a useful learning algorithm. In particula
Externí odkaz:
http://arxiv.org/abs/1812.07010
Publikováno v:
Advances in Neural Information Processing Systems 32 (2018), 7232-7243
In extreme classification problems, learning algorithms are required to map instances to labels from an extremely large label set. We build on a recent extreme classification framework with logarithmic time and space, and on a general approach for er
Externí odkaz:
http://arxiv.org/abs/1803.03319