Zobrazeno 1 - 10
of 233
pro vyhledávání: '"Shmatikov, Vitaly"'
Recent work showed that retrieval based on embedding similarity (e.g., for retrieval-augmented generation) is vulnerable to poisoning: an adversary can craft malicious documents that are retrieved in response to broad classes of queries. We demonstra
Externí odkaz:
http://arxiv.org/abs/2410.02163
Autor:
Sun, Zhen, Shmatikov, Vitaly
Several recently proposed censorship circumvention systems use encrypted network channels of popular applications to hide their communications. For example, a Tor pluggable transport called Snowflake uses the WebRTC data channel, while a system calle
Externí odkaz:
http://arxiv.org/abs/2409.06247
We introduce a new type of indirect injection attacks against language models that operate on images: hidden ''meta-instructions'' that influence how the model interprets the image and steer the model's outputs to express an adversary-chosen style, s
Externí odkaz:
http://arxiv.org/abs/2407.08970
Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database, then generating an answer by applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on datab
Externí odkaz:
http://arxiv.org/abs/2406.05870
We consider the problem of language model inversion: given outputs of a language model, we seek to extract the prompt that generated these outputs. We develop a new black-box method, output2prompt, that learns to extract prompts without access to the
Externí odkaz:
http://arxiv.org/abs/2405.15012
Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of informatio
Externí odkaz:
http://arxiv.org/abs/2311.13647
How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled generati
Externí odkaz:
http://arxiv.org/abs/2310.06816
Multi-modal embeddings encode texts, images, thermal images, sounds, and videos into a single embedding space, aligning representations across different modalities (e.g., associate an image of a dog with a barking sound). In this paper, we show that
Externí odkaz:
http://arxiv.org/abs/2308.11804
We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the
Externí odkaz:
http://arxiv.org/abs/2307.10490
Autor:
Bagdasaryan, Eugene, Shmatikov, Vitaly
Machine learning (ML) models trained on data from potentially untrusted sources are vulnerable to poisoning. A small, maliciously crafted subset of the training inputs can cause the model to learn a "backdoor" task (e.g., misclassify inputs with a ce
Externí odkaz:
http://arxiv.org/abs/2302.04977