Výsledky vyhledávání

Report

Visualizing Loss Functions as Topological Landscape Profiles

Autor: Geniesse, Caleb, Chen, Jiaqing, Xie, Tiankai, Shi, Ge, Yang, Yaoqing, Morozov, Dmitriy, Perciano, Talita, Mahoney, Michael W., Maciejewski, Ross, Weber, Gunther H.

In machine learning, a loss function measures the difference between model predictions and ground-truth (or target) values. For neural network models, visualizing how this loss changes as model parameters are varied can provide insights into the loca

Externí odkaz: http://arxiv.org/abs/2411.12136

Zobrazit plný text záznamu

Report

Evaluating Loss Landscapes from a Topology Perspective

Autor: Xie, Tiankai, Geniesse, Caleb, Chen, Jiaqing, Yang, Yaoqing, Morozov, Dmitriy, Mahoney, Michael W., Maciejewski, Ross, Weber, Gunther H.

Characterizing the loss of a neural network with respect to model parameters, i.e., the loss landscape, can provide valuable insights into properties of that model. Various methods for visualizing loss landscapes have been proposed, but less emphasis

Externí odkaz: http://arxiv.org/abs/2411.09807

Zobrazit plný text záznamu

Report

Squeezed Attention: Accelerating Long Context Length LLM Inference

Autor: Hooper, Coleman, Kim, Sehoon, Mohammadzadeh, Hiva, Maheswaran, Monishwaran, Paik, June, Mahoney, Michael W., Keutzer, Kurt, Gholami, Amir

Emerging Large Language Model (LLM) applications require long input prompts to perform complex downstream tasks like document analysis and code generation. For these long context length applications, the length of the input prompt poses a significant

Externí odkaz: http://arxiv.org/abs/2411.09688

Zobrazit plný text záznamu

Report

$\spadesuit$ SPADE $\spadesuit$ Split Peak Attention DEcomposition

Autor: Wolff, Malcolm, Olivares, Kin G., Oreshkin, Boris, Ruan, Sunny, Yang, Sitan, Katoch, Abhinav, Ramasubramanian, Shankar, Zhang, Youxin, Mahoney, Michael W., Efimov, Dmitry, Quenneville-Bélair, Vincent

Publikováno v: In 31st Conference on Neural Information Processing In 38th Conference on Neural Information Processing Systems NIPS 2017, Time Series in the Age of Large Models Workshop, 2024

Demand forecasting faces challenges induced by Peak Events (PEs) corresponding to special periods such as promotions and holidays. Peak events create significant spikes in demand followed by demand ramp down periods. Neural networks like MQCNN and MQ

Externí odkaz: http://arxiv.org/abs/2411.05852

Zobrazit plný text záznamu

Report

How many classifiers do we need?

Autor: Kim, Hyunsuk, Hodgkinson, Liam, Theisen, Ryan, Mahoney, Michael W.

As performance gains through scaling data and/or model size experience diminishing returns, it is becoming increasingly popular to turn to ensembling, where the predictions of multiple models are combined to improve accuracy. In this paper, we provid

Externí odkaz: http://arxiv.org/abs/2411.00328

Zobrazit plný text záznamu

Report

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

Autor: Lu, Haiquan, Zhou, Yefan, Liu, Shiwei, Wang, Zhangyang, Mahoney, Michael W., Yang, Yaoqing

Recent work on pruning large language models (LLMs) has shown that one can eliminate a large number of parameters without compromising performance, making pruning a promising strategy to reduce LLM model size. Existing LLM pruning strategies typicall

Externí odkaz: http://arxiv.org/abs/2410.10912

Zobrazit plný text záznamu

Report

Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting

Autor: Lim, Soon Hoe, Wang, Yijin, Yu, Annan, Hart, Emma, Mahoney, Michael W., Li, Xiaoye S., Erichson, N. Benjamin

Flow matching has recently emerged as a powerful paradigm for generative modeling and has been extended to probabilistic time series forecasting in latent spaces. However, the impact of the specific choice of probability path model on forecasting per

Externí odkaz: http://arxiv.org/abs/2410.03229

Zobrazit plný text záznamu

Report

Mitigating Memorization In Language Models

Autor: Sakarvadia, Mansi, Ajith, Aswathy, Khan, Arham, Hudson, Nathaniel, Geniesse, Caleb, Chard, Kyle, Yang, Yaoqing, Foster, Ian, Mahoney, Michael W.

Language models (LMs) can "memorize" information, i.e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data. This ability to extract training data can be problematic, for exa

Externí odkaz: http://arxiv.org/abs/2410.02159

Zobrazit plný text záznamu

Report

Tuning Frequency Bias of State Space Models

Autor: Yu, Annan, Lyu, Dongwei, Lim, Soon Hoe, Mahoney, Michael W., Erichson, N. Benjamin

State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-f

Externí odkaz: http://arxiv.org/abs/2410.02035

Zobrazit plný text záznamu

Report

Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models

Autor: Fang, Yuchen, Na, Sen, Mahoney, Michael W., Kolar, Mladen

In this work, we consider solving optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Sequential Quadratic Programming method to find both first- and second-order stationary points. Our

Externí odkaz: http://arxiv.org/abs/2409.15734

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání