Výsledky vyhledávání

Report

Circuit Breaking: Removing Model Behaviors with Targeted Ablation

Autor: Li, Maximilian, Davies, Xander, Nadeau, Max

Publikováno v: Workshop on Challenges in Deployable Generative AI at International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA. 2023

Language models often exhibit behaviors that improve performance on a pre-training objective but harm performance on downstream tasks. We propose a novel approach to removing undesirable behaviors by ablating a small number of causal pathways between

Externí odkaz: http://arxiv.org/abs/2309.05973

Zobrazit plný text záznamu

Report

Benchmarks for Detecting Measurement Tampering

Autor: Roger, Fabien, Greenblatt, Ryan, Nadeau, Max, Shlegeris, Buck, Thomas, Nate

When training powerful AI systems to perform complex tasks, it may be challenging to provide training signals which are robust to optimization. One concern is \textit{measurement tampering}, where the AI system manipulates multiple measurements to cr

Externí odkaz: http://arxiv.org/abs/2308.15605

Zobrazit plný text záznamu

Report

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there

Externí odkaz: http://arxiv.org/abs/2307.15217

Zobrazit plný text záznamu

Report

Discovering Variable Binding Circuitry with Desiderata

Autor: Davies, Xander, Nadeau, Max, Prakash, Nikhil, Shaham, Tamar Rott, Bau, David

Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits. Here, we introduce an approach which extends causal media

Externí odkaz: http://arxiv.org/abs/2307.03637

Zobrazit plný text záznamu

Report

Robust Feature-Level Adversaries are Interpretability Tools

Autor: Casper, Stephen, Nadeau, Max, Hadfield-Menell, Dylan, Kreiman, Gabriel

The literature on adversarial attacks in computer vision typically focuses on pixel-level perturbations. These tend to be very difficult to interpret. Recent work that manipulates the latent representations of image generators to create "feature-leve

Externí odkaz: http://arxiv.org/abs/2110.03605

Zobrazit plný text záznamu

Akademický článek

Impacts of Mineral Dust on Trace Element Concentrations (As, Cd, Cu, Ni and Pb) in Lichens and Soils at Lhù'ààn Mân' (Yukon Territory, Canada).

Autor: Pouillé, Sophie¹ (AUTHOR) sophie.pouille@umontreal.ca, Talbot, Julie¹ (AUTHOR), Tamalavage, Anne E.² (AUTHOR), Kessler‐Nadeau, Max Émile¹ (AUTHOR), King, James¹ (AUTHOR)

Publikováno v: Journal of Geophysical Research. Biogeosciences. Jun2024, Vol. 129 Issue 6, p1-17. 17p.

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Dissertation/ Thesis

L'expertise en lecture : schématiser les comportements de lecture de professionnels des communications

Autor: Nadeau, Maxime

L’apparition du concept de littératie — né autour des années 1980 d’une préoccupation quant aux taux alarmants d’illettrisme des populations occidentales scolarisées, et d’un problème définitionnel concernant la notion d’illettrism

Externí odkaz: http://savoirs.usherbrooke.ca/handle/11143/5361

Zobrazit plný text záznamu

Dissertation/ Thesis

Contamination atmosphérique en éléments traces au sein de tourbières ombrotrophes situées à proximité d'une fonderie de cuivre

Autor: Kessler-Nadeau, Max Émile

La région de Rouyn-Noranda est fortement touchée par la contamination en éléments traces (ET), tels que l’arsenic (As), le cadmium (Cd), le cuivre (Cu) et le plomb (Pb), provenant des dépositions atmosphériques générées par les émissions

Externí odkaz: http://hdl.handle.net/1866/26493

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání