Method for testing NLP models with text adversarial examples

Autor:	Artem B. Menisov, Aleksandr G. Lomako, Timur R. Sabirov
Jazyk:	English<br />Russian
Rok vydání:	2023
Předmět:	artificial intelligence natural language processing information security adversarial attacks security testing Optics. Light QC350-467 Electronic computers. Computer science QA75.5-76.95
Zdroj:	Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki, Vol 23, Iss 5, Pp 946-954 (2023)
Druh dokumentu:	article
ISSN:	2226-1494 2500-0373
DOI:	10.17586/2226-1494-2023-23-5-946-954
Popis:	At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/f3b8acb95c6f4962801975bd0bcdd26a Zobrazit plný text záznamu View record in DOAJ