Zobrazeno 1 - 10
of 964
pro vyhledávání: '"Deiseroth A"'
Autor:
Härle, Ruben, Friedrich, Felix, Brack, Manuel, Deiseroth, Björn, Schramowski, Patrick, Kersting, Kristian
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, but their output may not be aligned with the user or even produce harmful content. This paper presents a novel approach to detect and steer concepts
Externí odkaz:
http://arxiv.org/abs/2411.07122
Autor:
Blake, Charlie, Eichenberg, Constantin, Dean, Josef, Balles, Lukas, Prince, Luke Y., Deiseroth, Björn, Cruz-Salinas, Andres Felipe, Luschi, Carlo, Weinbach, Samuel, Orr, Douglas
The Maximal Update Parametrization ($\mu$P) aims to make the optimal hyperparameters (HPs) of a model independent of its size, allowing them to be swept using a cheap proxy model rather than the full-size target model. We present a new scheme, u-$\mu
Externí odkaz:
http://arxiv.org/abs/2407.17465
Tokenizers are crucial for encoding information in Large Language Models, but their development has recently stagnated, and they contain inherent weaknesses. Major limitations include computational overhead, ineffective vocabulary use, and unnecessar
Externí odkaz:
http://arxiv.org/abs/2406.19223
Autor:
Poli, Michael, Thomas, Armin W, Nguyen, Eric, Ponnusamy, Pragaash, Deiseroth, Björn, Kersting, Kristian, Suzuki, Taiji, Hie, Brian, Ermon, Stefano, Ré, Christopher, Zhang, Ce, Massaroli, Stefano
The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by
Externí odkaz:
http://arxiv.org/abs/2403.17844
Autor:
Deiseroth, Björn, Meuer, Max, Gritsch, Nikolas, Eichenberg, Constantin, Schramowski, Patrick, Aßenmacher, Matthias, Kersting, Kristian
Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. However, their ever-increasing size has raised concerns about their effective deployment and the need for LLM compression. This study introduce
Externí odkaz:
http://arxiv.org/abs/2311.01544
Autor:
Bellagente, Marco, Brack, Manuel, Teufel, Hannah, Friedrich, Felix, Deiseroth, Björn, Eichenberg, Constantin, Dai, Andrew, Baldock, Robert, Nanda, Souradeep, Oostermeijer, Koen, Cruz-Salinas, Andres Felipe, Schramowski, Patrick, Kersting, Kristian, Weinbach, Samuel
The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations o
Externí odkaz:
http://arxiv.org/abs/2305.15296
Autor:
Deiseroth, Björn, Deb, Mayukh, Weinbach, Samuel, Brack, Manuel, Schramowski, Patrick, Kersting, Kristian
Generative transformer models have become increasingly complex, with large numbers of parameters and the ability to process multiple input modalities. Current methods for explaining their predictions are resource-intensive. Most crucially, they requi
Externí odkaz:
http://arxiv.org/abs/2301.08110
Autor:
Weinbach, Samuel, Bellagente, Marco, Eichenberg, Constantin, Dai, Andrew, Baldock, Robert, Nanda, Souradeep, Deiseroth, Björn, Oostermeijer, Koen, Teufel, Hannah, Cruz-Salinas, Andres Felipe
We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text,
Externí odkaz:
http://arxiv.org/abs/2212.02936
Autor:
Hämmerl, Katharina, Deiseroth, Björn, Schramowski, Patrick, Libovický, Jindřich, Rothkopf, Constantin A., Fraser, Alexander, Kersting, Kristian
Pre-trained multilingual language models (PMLMs) are commonly used when dealing with data from multiple languages and cross-lingual transfer. However, PMLMs are trained on varying amounts of data for each language. In practice this means their perfor
Externí odkaz:
http://arxiv.org/abs/2211.07733
Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-size
Externí odkaz:
http://arxiv.org/abs/2211.05105