Zobrazeno 1 - 10
of 1 636
pro vyhledávání: '"A, Weinbach"'
Autor:
Ali, Mehdi, Fromm, Michael, Thellmann, Klaudia, Ebert, Jan, Weber, Alexander Arno, Rutmann, Richard, Jain, Charvi, Lübbering, Max, Steinigen, Daniel, Leveling, Johannes, Klug, Katrin, Buschhoff, Jasper Schulze, Jurkschat, Lena, Abdelwahab, Hammam, Stein, Benny Jörg, Sylla, Karl-Heinz, Denisov, Pavel, Brandizzi, Nicolo', Saleem, Qasid, Bhowmick, Anirban, Helmer, Lennard, John, Chelsea, Suarez, Pedro Ortiz, Ostendorff, Malte, Jude, Alex, Manjunath, Lalith, Weinbach, Samuel, Penke, Carolin, Filatov, Oleg, Asaadi, Shima, Barth, Fabio, Sifa, Rafet, Küch, Fabian, Herten, Andreas, Jäkel, René, Rehm, Georg, Kesselheim, Stefan, Köhler, Joachim, Flores-Herr, Nicolas
We present two multilingual LLMs designed to embrace Europe's linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset comprising around 60% non-English data and utilizing a custom multilingual tokenize
Externí odkaz:
http://arxiv.org/abs/2410.03730
Autor:
Blake, Charlie, Eichenberg, Constantin, Dean, Josef, Balles, Lukas, Prince, Luke Y., Deiseroth, Björn, Cruz-Salinas, Andres Felipe, Luschi, Carlo, Weinbach, Samuel, Orr, Douglas
The Maximal Update Parametrization ($\mu$P) aims to make the optimal hyperparameters (HPs) of a model independent of its size, allowing them to be swept using a cheap proxy model rather than the full-size target model. We present a new scheme, u-$\mu
Externí odkaz:
http://arxiv.org/abs/2407.17465
Tokenizers are crucial for encoding information in Large Language Models, but their development has recently stagnated, and they contain inherent weaknesses. Major limitations include computational overhead, ineffective vocabulary use, and unnecessar
Externí odkaz:
http://arxiv.org/abs/2406.19223
Autor:
Hagemann, Johannes, Weinbach, Samuel, Dobler, Konstantin, Schall, Maximilian, de Melo, Gerard
Efficiently training large language models requires parallelizing across hundreds of hardware accelerators and invoking various compute and memory optimizations. When combined, many of these strategies have complex interactions regarding the final tr
Externí odkaz:
http://arxiv.org/abs/2311.05610
Autor:
Ali, Mehdi, Fromm, Michael, Thellmann, Klaudia, Rutmann, Richard, Lübbering, Max, Leveling, Johannes, Klug, Katrin, Ebert, Jan, Doll, Niclas, Buschhoff, Jasper Schulze, Jain, Charvi, Weber, Alexander Arno, Jurkschat, Lena, Abdelwahab, Hammam, John, Chelsea, Suarez, Pedro Ortiz, Ostendorff, Malte, Weinbach, Samuel, Sifa, Rafet, Kesselheim, Stefan, Flores-Herr, Nicolas
The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer influence as
Externí odkaz:
http://arxiv.org/abs/2310.08754
Autor:
Bellagente, Marco, Brack, Manuel, Teufel, Hannah, Friedrich, Felix, Deiseroth, Björn, Eichenberg, Constantin, Dai, Andrew, Baldock, Robert, Nanda, Souradeep, Oostermeijer, Koen, Cruz-Salinas, Andres Felipe, Schramowski, Patrick, Kersting, Kristian, Weinbach, Samuel
The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations o
Externí odkaz:
http://arxiv.org/abs/2305.15296
Autor:
Deiseroth, Björn, Deb, Mayukh, Weinbach, Samuel, Brack, Manuel, Schramowski, Patrick, Kersting, Kristian
Generative transformer models have become increasingly complex, with large numbers of parameters and the ability to process multiple input modalities. Current methods for explaining their predictions are resource-intensive. Most crucially, they requi
Externí odkaz:
http://arxiv.org/abs/2301.08110
Autor:
Weinbach, Samuel, Bellagente, Marco, Eichenberg, Constantin, Dai, Andrew, Baldock, Robert, Nanda, Souradeep, Deiseroth, Björn, Oostermeijer, Koen, Teufel, Hannah, Cruz-Salinas, Andres Felipe
We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text,
Externí odkaz:
http://arxiv.org/abs/2212.02936
Publikováno v:
Journal of Eating Disorders, Vol 12, Iss 1, Pp 1-11 (2024)
Abstract Individuals exhibiting restrained eating behaviors demonstrate increased inhibitory control when exposed to food-related stimuli, indicating the presence of an automatic food-inhibition association. Existing literature proposes that this ass
Externí odkaz:
https://doaj.org/article/c1c6eff965c0472eaf92ebe5a8419a9a
Autor:
Black, Sid, Biderman, Stella, Hallahan, Eric, Anthony, Quentin, Gao, Leo, Golding, Laurence, He, Horace, Leahy, Connor, McDonell, Kyle, Phang, Jason, Pieler, Michael, Prashanth, USVSN Sai, Purohit, Shivanshu, Reynolds, Laria, Tow, Jonathan, Wang, Ben, Weinbach, Samuel
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest d
Externí odkaz:
http://arxiv.org/abs/2204.06745