Zobrazeno 1 - 10
of 380
pro vyhledávání: '"Luschi, P"'
Autor:
Cattaneo, Alberto, Bonner, Stephen, Martynec, Thomas, Luschi, Carlo, Barrett, Ian P, Justus, Daniel
Knowledge Graph Completion has been increasingly adopted as a useful method for several tasks in biomedical research, like drug repurposing or drug-target identification. To that end, a variety of datasets and Knowledge Graph Embedding models has bee
Externí odkaz:
http://arxiv.org/abs/2409.04103
Autor:
Blake, Charlie, Eichenberg, Constantin, Dean, Josef, Balles, Lukas, Prince, Luke Y., Deiseroth, Björn, Cruz-Salinas, Andres Felipe, Luschi, Carlo, Weinbach, Samuel, Orr, Douglas
The Maximal Update Parametrization ($\mu$P) aims to make the optimal hyperparameters (HPs) of a model independent of its size, allowing them to be swept using a cheap proxy model rather than the full-size target model. We present a new scheme, u-$\mu
Externí odkaz:
http://arxiv.org/abs/2407.17465
Low-precision formats such as float8 have been introduced in machine learning accelerated hardware to improve computational efficiency for large language models training and inference. Nevertheless, adoption by the ML community has been slowed down b
Externí odkaz:
http://arxiv.org/abs/2407.17353
Autor:
Mathiasen, Alexander, Helal, Hatem, Balanca, Paul, Krzywaniak, Adam, Parviz, Ali, Hvilshøj, Frederik, Banaszewski, Blazej, Luschi, Carlo, Fitzgibbon, Andrew William
Density Functional Theory (DFT) accurately predicts the quantum chemical properties of molecules, but scales as $O(N_{\text{electrons}}^3)$. Sch\"utt et al. (2019) successfully approximate DFT 1000x faster with Neural Networks (NN). Arguably, the big
Externí odkaz:
http://arxiv.org/abs/2402.04030
Autor:
Ribar, Luka, Chelombiev, Ivan, Hudlass-Galley, Luke, Blake, Charlie, Luschi, Carlo, Orr, Douglas
The computational difficulties of large language model (LLM) inference remain a significant obstacle to their widespread deployment. The need for many applications to support long input sequences and process them in large batches typically causes tok
Externí odkaz:
http://arxiv.org/abs/2312.04985
Autor:
Mathiasen, Alexander, Helal, Hatem, Klaser, Kerstin, Balanca, Paul, Dean, Josef, Luschi, Carlo, Beaini, Dominique, Fitzgibbon, Andrew, Masters, Dominic
The emergence of foundation models in Computer Vision and Natural Language Processing have resulted in immense progress on downstream tasks. This progress was enabled by datasets with billions of training examples. Similar benefits are yet to be unlo
Externí odkaz:
http://arxiv.org/abs/2311.01135
Autor:
Perez, Sergio P., Zhang, Yan, Briggs, James, Blake, Charlie, Levy-Kramer, Josh, Balanca, Paul, Luschi, Carlo, Barlow, Stephen, Fitzgibbon, Andrew William
FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic ra
Externí odkaz:
http://arxiv.org/abs/2309.17224
Publikováno v:
Endangered Species Research, Vol 54, Pp 395-408 (2024)
Knowledge of the distribution and density of marine species is key to understanding habitat use and interactions with human activities. Yet such information for sea turtles remains scarce, especially at foraging areas, where low turtle density repres
Externí odkaz:
https://doaj.org/article/2f1645d42f244ca2b6ccf81e37e66112
We present unit scaling, a paradigm for designing deep learning models that simplifies the use of low-precision number formats. Training in FP16 or the recently proposed FP8 formats offers substantial efficiency gains, but can lack sufficient range f
Externí odkaz:
http://arxiv.org/abs/2303.11257
Autor:
Cattaneo, Alberto, Justus, Daniel, Mellor, Harry, Orr, Douglas, Maloberti, Jerome, Liu, Zhenying, Farnsworth, Thorin, Fitzgibbon, Andrew, Banaszewski, Blazej, Luschi, Carlo
We present the award-winning submission to the WikiKG90Mv2 track of OGB-LSC@NeurIPS 2022. The task is link-prediction on the large-scale knowledge graph WikiKG90Mv2, consisting of 90M+ nodes and 600M+ edges. Our solution uses a diverse ensemble of $8
Externí odkaz:
http://arxiv.org/abs/2211.12281