Zobrazeno 1 - 10
of 17 815
pro vyhledávání: '"P. Mahoney"'
Autor:
Xie, Tiankai, Geniesse, Caleb, Chen, Jiaqing, Yang, Yaoqing, Morozov, Dmitriy, Mahoney, Michael W., Maciejewski, Ross, Weber, Gunther H.
Characterizing the loss of a neural network with respect to model parameters, i.e., the loss landscape, can provide valuable insights into properties of that model. Various methods for visualizing loss landscapes have been proposed, but less emphasis
Externí odkaz:
http://arxiv.org/abs/2411.09807
Autor:
Hooper, Coleman, Kim, Sehoon, Mohammadzadeh, Hiva, Maheswaran, Monishwaran, Paik, June, Mahoney, Michael W., Keutzer, Kurt, Gholami, Amir
Emerging Large Language Model (LLM) applications require long input prompts to perform complex downstream tasks like document analysis and code generation. For these long context length applications, the length of the input prompt poses a significant
Externí odkaz:
http://arxiv.org/abs/2411.09688
Autor:
Wolff, Malcolm, Olivares, Kin G., Oreshkin, Boris, Ruan, Sunny, Yang, Sitan, Katoch, Abhinav, Ramasubramanian, Shankar, Zhang, Youxin, Mahoney, Michael W., Efimov, Dmitry, Quenneville-Bélair, Vincent
Publikováno v:
In 31st Conference on Neural Information Processing In 38th Conference on Neural Information Processing Systems NIPS 2017, Time Series in the Age of Large Models Workshop, 2024
Demand forecasting faces challenges induced by Peak Events (PEs) corresponding to special periods such as promotions and holidays. Peak events create significant spikes in demand followed by demand ramp down periods. Neural networks like MQCNN and MQ
Externí odkaz:
http://arxiv.org/abs/2411.05852
As performance gains through scaling data and/or model size experience diminishing returns, it is becoming increasingly popular to turn to ensembling, where the predictions of multiple models are combined to improve accuracy. In this paper, we provid
Externí odkaz:
http://arxiv.org/abs/2411.00328
Recent work on pruning large language models (LLMs) has shown that one can eliminate a large number of parameters without compromising performance, making pruning a promising strategy to reduce LLM model size. Existing LLM pruning strategies typicall
Externí odkaz:
http://arxiv.org/abs/2410.10912
Autor:
Lim, Soon Hoe, Wang, Yijin, Yu, Annan, Hart, Emma, Mahoney, Michael W., Li, Xiaoye S., Erichson, N. Benjamin
Flow matching has recently emerged as a powerful paradigm for generative modeling and has been extended to probabilistic time series forecasting in latent spaces. However, the impact of the specific choice of probability path model on forecasting per
Externí odkaz:
http://arxiv.org/abs/2410.03229
Autor:
Sakarvadia, Mansi, Ajith, Aswathy, Khan, Arham, Hudson, Nathaniel, Geniesse, Caleb, Chard, Kyle, Yang, Yaoqing, Foster, Ian, Mahoney, Michael W.
Language models (LMs) can "memorize" information, i.e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data. This ability to extract training data can be problematic, for exa
Externí odkaz:
http://arxiv.org/abs/2410.02159
State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-f
Externí odkaz:
http://arxiv.org/abs/2410.02035
In this work, we consider solving optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Sequential Quadratic Programming method to find both first- and second-order stationary points. Our
Externí odkaz:
http://arxiv.org/abs/2409.15734
Autor:
Dcruz, Julian Gerald, Mahoney, Sam, Chua, Jia Yun, Soukhabandith, Adoundeth, Mugabe, John, Guo, Weisi, Arana-Catania, Miguel
Autonomous operations of robots in unknown environments are challenging due to the lack of knowledge of the dynamics of the interactions, such as the objects' movability. This work introduces a novel Causal Reinforcement Learning approach to enhancin
Externí odkaz:
http://arxiv.org/abs/2409.13423