Výsledky vyhledávání - "Pan, Alexa Y."

Report

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Autor: Pacchiardi, Lorenzo, Chan, Alex J., Mindermann, Sören, Moscovitz, Ilan, Pan, Alexa Y., Gal, Yarin, Evans, Owain, Brauner, Jan

Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrable sense. LLMs might "lie", for example, when instructed to output misinformation. Here, we develop a simple lie detecto

Externí odkaz: http://arxiv.org/abs/2309.15840

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání