Abstrakt: |
The research is conducted on the topic of preventing various types of attacks on large language models, as well as preventing the leakage of confidential data when working with local text databases. Research is performed by implementing a filter and testing it on an example that aims to filter requests to the model. The proposed filter does not block the request to the large language models, but removes its parts, which is much faster and makes it impossible for an attacker to pick up a request, as it destroys its structure. The filter uses word embedding to evaluate the request to the large language models, which together with the use of a hash table for forbidden topics, speeds up the operation of the filter. To protect against attacks such as prompt injection and prompt leaking attack, the filter uses the method of randomly closing the sequence. During the testing process, significant improvements were obtained in maintaining the security of data used by large language models. Currently, the use of such filters in product projects and startups is an extremely important step, but there is a lack of ready-made implementations of filters with similar properties. The uniqueness of the filter lies in its independence from large language models and the use of semantic similarity as a fine-tuned way of classifying queries. [ABSTRACT FROM AUTHOR] |