The Boy Who Survived: Removing Harry Potter from an LLM is harder than reported
Autor: | Shostack, Adam |
---|---|
Rok vydání: | 2024 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | Recent work arXiv.2310.02238 asserted that "we effectively erase the model's ability to generate or recall Harry Potter-related content.'' This claim is shown to be overbroad. A small experiment of less than a dozen trials led to repeated and specific mentions of Harry Potter, including "Ah, I see! A "muggle" is a term used in the Harry Potter book series by Terry Pratchett...'' Comment: 2 pages, 4 pages of appendix. Comment on arXiv:2310.02238 |
Databáze: | arXiv |
Externí odkaz: |