Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Richter, Mats L."'
Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling
Externí odkaz:
http://arxiv.org/abs/2406.04940
Autor:
Ibrahim, Adam, Thérien, Benjamin, Gupta, Kshitij, Richter, Mats L., Anthony, Quentin, Lesort, Timothée, Belilovsky, Eugene, Rish, Irina
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute co
Externí odkaz:
http://arxiv.org/abs/2403.08763
Autor:
Gupta, Kshitij, Thérien, Benjamin, Ibrahim, Adam, Richter, Mats L., Anthony, Quentin, Belilovsky, Eugene, Rish, Irina, Lesort, Timothée
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these mo
Externí odkaz:
http://arxiv.org/abs/2308.04014
Publikováno v:
The Twelfth International Conference on Learning Representations (ICLR), 2024
We introduce W\"urstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a late
Externí odkaz:
http://arxiv.org/abs/2306.00637
Autor:
Richter, Mats L.
The design and adjustment of convolutional neural network architectures is an opaque and mostly trial and error-driven process. The main reason for this is the lack of proper paradigms beyond general conventions for the development of neural networks
Externí odkaz:
https://osnadocs.ub.uni-osnabrueck.de/handle/ds-202205106814
Autor:
Richter, Mats L., Pal, Christopher
Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive fiel
Externí odkaz:
http://arxiv.org/abs/2211.14487
When optimizing convolutional neural networks (CNN) for a specific image-based task, specialists commonly overshoot the number of convolutional layers in their designs. By implication, these CNNs are unnecessarily resource intensive to train and depl
Externí odkaz:
http://arxiv.org/abs/2106.12307
In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the
Externí odkaz:
http://arxiv.org/abs/2106.09526
Publikováno v:
Artificial Neural Networks and Machine Learning ICANN 2021 133-144
Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling. However, we find that fully convolutional image classifiers are not agnostic to the input size but rather show significant
Externí odkaz:
http://arxiv.org/abs/2102.01582
Publikováno v:
British Machine Vision Conference (BMVC) 2021
We propose layer saturation - a simple, online-computable method for analyzing the information processing in neural networks. First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss. We
Externí odkaz:
http://arxiv.org/abs/2006.08679