Výsledky vyhledávání

Report

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs

Autor: Härle, Ruben, Friedrich, Felix, Brack, Manuel, Deiseroth, Björn, Schramowski, Patrick, Kersting, Kristian

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, but their output may not be aligned with the user or even produce harmful content. This paper presents a novel approach to detect and steer concepts

Externí odkaz: http://arxiv.org/abs/2411.07122

Zobrazit plný text záznamu

Report

u-$\mu$P: The Unit-Scaled Maximal Update Parametrization

Autor: Blake, Charlie, Eichenberg, Constantin, Dean, Josef, Balles, Lukas, Prince, Luke Y., Deiseroth, Björn, Cruz-Salinas, Andres Felipe, Luschi, Carlo, Weinbach, Samuel, Orr, Douglas

The Maximal Update Parametrization ($\mu$P) aims to make the optimal hyperparameters (HPs) of a model independent of its size, allowing them to be swept using a cheap proxy model rather than the full-size target model. We present a new scheme, u-$\mu

Externí odkaz: http://arxiv.org/abs/2407.17465

Zobrazit plný text záznamu

Report

T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings

Autor: Deiseroth, Björn, Brack, Manuel, Schramowski, Patrick, Kersting, Kristian, Weinbach, Samuel

Tokenizers are crucial for encoding information in Large Language Models, but their development has recently stagnated, and they contain inherent weaknesses. Major limitations include computational overhead, ineffective vocabulary use, and unnecessar

Externí odkaz: http://arxiv.org/abs/2406.19223

Zobrazit plný text záznamu

Report

Mechanistic Design and Scaling of Hybrid Architectures

Autor: Poli, Michael, Thomas, Armin W, Nguyen, Eric, Ponnusamy, Pragaash, Deiseroth, Björn, Kersting, Kristian, Suzuki, Taiji, Hie, Brian, Ermon, Stefano, Ré, Christopher, Zhang, Ce, Massaroli, Stefano

The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by

Externí odkaz: http://arxiv.org/abs/2403.17844

Zobrazit plný text záznamu

Report

Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization

Autor: Deiseroth, Björn, Meuer, Max, Gritsch, Nikolas, Eichenberg, Constantin, Schramowski, Patrick, Aßenmacher, Matthias, Kersting, Kristian

Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. However, their ever-increasing size has raised concerns about their effective deployment and the need for LLM compression. This study introduce

Externí odkaz: http://arxiv.org/abs/2311.01544

Zobrazit plný text záznamu

Report

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

Autor: Bellagente, Marco, Brack, Manuel, Teufel, Hannah, Friedrich, Felix, Deiseroth, Björn, Eichenberg, Constantin, Dai, Andrew, Baldock, Robert, Nanda, Souradeep, Oostermeijer, Koen, Cruz-Salinas, Andres Felipe, Schramowski, Patrick, Kersting, Kristian, Weinbach, Samuel

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations o

Externí odkaz: http://arxiv.org/abs/2305.15296

Zobrazit plný text záznamu

Report

AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation

Autor: Deiseroth, Björn, Deb, Mayukh, Weinbach, Samuel, Brack, Manuel, Schramowski, Patrick, Kersting, Kristian

Generative transformer models have become increasingly complex, with large numbers of parameters and the ability to process multiple input modalities. Current methods for explaining their predictions are resource-intensive. Most crucially, they requi

Externí odkaz: http://arxiv.org/abs/2301.08110

Zobrazit plný text záznamu

Report

M-VADER: A Model for Diffusion with Multimodal Context

Autor: Weinbach, Samuel, Bellagente, Marco, Eichenberg, Constantin, Dai, Andrew, Baldock, Robert, Nanda, Souradeep, Deiseroth, Björn, Oostermeijer, Koen, Teufel, Hannah, Cruz-Salinas, Andres Felipe

We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text,

Externí odkaz: http://arxiv.org/abs/2212.02936

Zobrazit plný text záznamu

Report

Speaking Multiple Languages Affects the Moral Bias of Language Models

Autor: Hämmerl, Katharina, Deiseroth, Björn, Schramowski, Patrick, Libovický, Jindřich, Rothkopf, Constantin A., Fraser, Alexander, Kersting, Kristian

Pre-trained multilingual language models (PMLMs) are commonly used when dealing with data from multiple languages and cross-lingual transfer. However, PMLMs are trained on varying amounts of data for each language. In practice this means their perfor

Externí odkaz: http://arxiv.org/abs/2211.07733

Zobrazit plný text záznamu

Report

Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models

Autor: Schramowski, Patrick, Brack, Manuel, Deiseroth, Björn, Kersting, Kristian

Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-size

Externí odkaz: http://arxiv.org/abs/2211.05105

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání