Zobrazeno 1 - 10
of 48 546
pro vyhledávání: '"A. Spector"'
Autor:
Kumar, Tanishq, Ankner, Zachary, Spector, Benjamin F., Bordelon, Blake, Muennighoff, Niklas, Paul, Mansheej, Pehlevan, Cengiz, Ré, Christopher, Raghunathan, Aditi
Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that traini
Externí odkaz:
http://arxiv.org/abs/2411.04330
Autor:
Roychowdhury, Prasun, Spector, Daniel
The main results of this paper are the establishment of sharp constants for several families of critical Sobolev embeddings. These inequalities were pioneered by David R. Adams, while the sharp constant in the first order case is due to Andrea Cianch
Externí odkaz:
http://arxiv.org/abs/2411.00293
The challenge of mapping AI architectures to GPU hardware is creating a critical bottleneck in AI progress. Despite substantial efforts, hand-written custom kernels fail to meet their theoretical performance thresholds, even on well-established opera
Externí odkaz:
http://arxiv.org/abs/2410.20399
Autor:
Zhang, Michael, Arora, Simran, Chalamala, Rahul, Wu, Alan, Spector, Benjamin, Singhal, Aaryan, Ramesh, Krithik, Ré, Christopher
Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizi
Externí odkaz:
http://arxiv.org/abs/2410.10254
This paper illustrates a further application of topological data analysis to the study of self-organising models for chemical and biological systems. In particular, we investigate whether topological summaries can capture the parameter dependence of
Externí odkaz:
http://arxiv.org/abs/2409.20491
Autor:
Domínguez, Oscar, Spector, Daniel
A central question which originates in the celebrated work in the 1980's of DiPerna and Majda asks what is the optimal decay $f > 0$ such that uniform rates $|\omega|(Q) \leq f(|Q|)$ of the vorticity maximal functions guarantee strong convergence wit
Externí odkaz:
http://arxiv.org/abs/2409.02344
Autor:
Kozlowski, Todd, Wei, Li-Wei, Spector, Aaron D., Hallal, Ayman, Fraedrich, Henry, Brotherton, Daniel C., Oceano, Isabella, Ejlli, Aldo, Grote, Hartmut, Hollis, Harold, Karan, Kanioar, Mueller, Guido, Tanner, D. B., Willke, Benno, Lindner, Axel
The Regeneration Cavity (RC) is a critical component of the Any Light Particle Search II (ALPS II) experiment. It increases the signal from possible axions and axion-like particles in the experiment by nearly four orders of magnitude. The total round
Externí odkaz:
http://arxiv.org/abs/2408.13218
Autor:
Ein-Dor, Liat, Toledo-Ronen, Orith, Spector, Artem, Gretz, Shai, Dankin, Lena, Halfon, Alon, Katz, Yoav, Slonim, Noam
Prompts are how humans communicate with LLMs. Informative prompts are essential for guiding LLMs to produce the desired output. However, prompt engineering is often tedious and time-consuming, requiring significant expertise, limiting its widespread
Externí odkaz:
http://arxiv.org/abs/2408.04560
Counterexample-driven genetic programming (CDGP) uses specifications provided as formal constraints to generate the training cases used to evaluate evolving programs. It has also been extended to combine formal constraints and user-provided training
Externí odkaz:
http://arxiv.org/abs/2408.12604
Autor:
Halfon, Alon, Gretz, Shai, Arviv, Ofir, Spector, Artem, Toledo-Ronen, Orith, Katz, Yoav, Ein-Dor, Liat, Shmueli-Scheuer, Michal, Slonim, Noam
Fine-tuning Large Language Models (LLMs) is an effective method to enhance their performance on downstream tasks. However, choosing the appropriate setting of tuning hyperparameters (HPs) is a labor-intensive and computationally expensive process. He
Externí odkaz:
http://arxiv.org/abs/2407.18990