Zobrazeno 1 - 10
of 977
pro vyhledávání: '"Safaryan, A."'
We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy keeps the
Externí odkaz:
http://arxiv.org/abs/2410.16103
The rising footprint of machine learning has led to a focus on imposing \emph{model sparsity} as a means of reducing computational and memory costs. For deep neural networks (DNNs), the state-of-the-art accuracy-vs-sparsity is achieved by heuristics
Externí odkaz:
http://arxiv.org/abs/2408.17163
Autor:
Modoranu, Ionut-Vlad, Safaryan, Mher, Malinovsky, Grigory, Kurtic, Eldar, Robert, Thomas, Richtarik, Peter, Alistarh, Dan
We propose a new variant of the Adam optimizer [Kingma and Ba, 2014] called MICROADAM that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. We achieve this by compressing the gradient information before i
Externí odkaz:
http://arxiv.org/abs/2405.15593
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting, where each worker has its own computation and communication speeds, as well as data distribution. In these algorithms, workers compute possibly stale and stocha
Externí odkaz:
http://arxiv.org/abs/2310.20452
Autor:
Galina Nifontova, Sofia Safaryan, Yana Khristidis, Olga Smirnova, Massoud Vosough, Anastasia Shpichka, Peter Timashev
Publikováno v:
Stem Cell Research & Therapy, Vol 15, Iss 1, Pp 1-23 (2024)
Abstract Background Wound healing represents a complex biological process, critically important in clinical practice due to its direct implication in a patient’s recovery and quality of life. Conservative wound management frequently falls short in
Externí odkaz:
https://doaj.org/article/15d6086c4c9645bab090d6572aa90091
Knowledge distillation is a popular approach for enhancing the performance of ''student'' models, with lower representational capacity, by taking advantage of more powerful ''teacher'' models. Despite its apparent simplicity and widespread use, the u
Externí odkaz:
http://arxiv.org/abs/2305.17581
Autor:
Márton Albert Hajnal, Duy Tran, Zsombor Szabó, Andrea Albert, Karen Safaryan, Michael Einstein, Mauricio Vallejo Martelo, Pierre-Olivier Polack, Peyman Golshani, Gergő Orbán
Publikováno v:
Nature Communications, Vol 15, Iss 1, Pp 1-17 (2024)
Abstract Attention supports decision making by selecting the features that are relevant for decisions. Selective enhancement of the relevant features and inhibition of distractors has been proposed as potential neural mechanisms driving this selectio
Externí odkaz:
https://doaj.org/article/57b9791178e04df99a729e6f78fb8c1a
Autor:
Aprahamian, Ani, Margaryan, Amur, Kakoyan, Vanik, Zhamkochyan, Simon, Abrahamyan, Sergey, Elbakyan, Hayk, Mayilyan, Samvel, Piloyan, Arpine, Vardanyan, Henrik, Zohrabyan, Hamlet, Gevorgian, Lekdar, Ayvazyan, Robert, Papyan, Artashes, Ayvazyan, Garnik, Ghalumyan, Arsen, Margaryan, Narek, Rostomyan, Hasmik, Safaryan, Anna, Grigoryan, Bagrat, Vardanyan, Ashot, Yeremyan, Arsham, Annand, John, Livingston, Kenneth, Montgomery, Rachel, Achenbach, Patrick, Pochodzalla, Josef, Balabanski, Dimiter L., Nakamura, Satoshi N., Sharyy, Viatcheslav, Yvon, Dominique, Brodeur, Maxime
The development of the advanced Radio Frequency Timer of electrons is described. It is based on a helical deflector, which performs circular or elliptical sweeps of keV electrons, by means of 500 MHz radio frequency field. By converting a time distri
Externí odkaz:
http://arxiv.org/abs/2211.16091
Autor:
Nifontova, Galina1 (AUTHOR), Safaryan, Sofia1 (AUTHOR), Khristidis, Yana1 (AUTHOR), Smirnova, Olga1 (AUTHOR), Vosough, Massoud2 (AUTHOR), Shpichka, Anastasia1 (AUTHOR) ana-shpichka@yandex.ru, Timashev, Peter1,3 (AUTHOR)
Publikováno v:
Stem Cell Research & Therapy. 10/17/2024, Vol. 15 Issue 1, p1-23. 23p.
We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication. While methods of this type have been studied
Externí odkaz:
http://arxiv.org/abs/2210.16402