Zobrazeno 1 - 10
of 9 824
pro vyhledávání: '"Karp P"'
Autor:
Karp, Dmitrii, Prilepkina, Elena
Investigation of the generalized trigonometric and hyperbolic functions containing two parameters has been a very active research area over the last decade. We believe, however, that their monotonicity and convexity properties with respect to paramet
Externí odkaz:
http://arxiv.org/abs/2411.13442
Autor:
Saunshi, Nikunj, Karp, Stefani, Krishnan, Shankar, Miryoosefi, Sobhan, Reddi, Sashank J., Kumar, Sanjiv
Given the increasing scale of model sizes, novel training strategies like gradual stacking [Gong et al., 2019, Reddi et al., 2023] have garnered interest. Stacking enables efficient training by gradually growing the depth of a model in stages and usi
Externí odkaz:
http://arxiv.org/abs/2409.19044
Elementary, but very useful lemma due to Biernacki and Krzy\.{z} (1955) asserts that the ratio of two power series inherits monotonicity from that of the sequence of ratios of their corresponding coefficients. Over the last two decades it has been re
Externí odkaz:
http://arxiv.org/abs/2408.01755
Gradient aggregation has long been identified as a major bottleneck in today's large-scale distributed machine learning training systems. One promising solution to mitigate such bottlenecks is gradient compression, directly reducing communicated grad
Externí odkaz:
http://arxiv.org/abs/2407.01378
Autor:
Benrimoh, David, Armstrong, Caitrin, Mehltretter, Joseph, Fratila, Robert, Perlman, Kelly, Israel, Sonia, Kapelner, Adam, Parikh, Sagar V., Karp, Jordan F., Heller, Katherine, Turecki, Gustavo
INTRODUCTION: The pharmacological treatment of Major Depressive Disorder (MDD) relies on a trial-and-error approach. We introduce an artificial intelligence (AI) model aiming to personalize treatment and improve outcomes, which was deployed in the Ar
Externí odkaz:
http://arxiv.org/abs/2406.04993
Recently, there has been increasing interest in efficient pretraining paradigms for training Transformer-based models. Several recent approaches use smaller models to initialize larger models in order to save computation (e.g., stacking and fusion).
Externí odkaz:
http://arxiv.org/abs/2406.02469
A quasi-exponential is an entire function of the form $e^{cu}p(u)$, where $p(u)$ is a polynomial and $c \in \mathbb{C}$. Let $V = \langle e^{h_1u}p_1(u), \dots, e^{h_Nu}p_N(u) \rangle$ be a vector space with a basis of quasi-exponentials. We show tha
Externí odkaz:
http://arxiv.org/abs/2405.20229
Autor:
Karp, Martin, Suarez, Estela, Meinke, Jan H., Andersson, Måns I., Schlatter, Philipp, Markidis, Stefano, Jansson, Niclas
The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method
Externí odkaz:
http://arxiv.org/abs/2405.05640
As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with ho
Externí odkaz:
http://arxiv.org/abs/2405.05639
Vision tasks are characterized by the properties of locality and translation invariance. The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked i
Externí odkaz:
http://arxiv.org/abs/2403.15707