Detecting and preventing violent behaviors of chat-bots

Autor:	Lapedriza Carrillo, Francesc
Přispěvatelé:	Padró, Lluís, Lapedriza Garcia, Agata, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Online chat groups Xats (Internet) Informàtica::Intel·ligència artificial::Aprenentatge automàtic [Àrees temàtiques de la UPC] xarxes neuronals text sentiment svm classifiers neural networks aprenentatge automàtic classificadors machine learning processat de llenguatge natural agents conversacionals Natural language processing (Computer science) sentiment en el text xatbots natural language processing Tractament del llenguatge natural (Informàtica) conversational agents
Zdroj:	UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC)
Popis:	Ja fa molts anys que existeixen els xatbots, però les primeres iteracions d'aquesta tecnologia no eren res més que un conjunt de regles lògiques que responien preguntes amb respostes preprogramades. Recentment, el desenvolupament de hardware més potent i el refinament dels algorismes d'aprenentatge automàtic han permès que aquests es puguin fer servir en una gran varietat de camps. Un d'aquests camps és el desenvolupament de models de diàleg generatius. Els models de diàleg generatius fan servir una gran quantitat de dades d'entrada per a crear un model que emula el comportament humà. Les dades usades per a entrenar aquest model poden contenir elements amb comportaments no desitjats que poden ser heretats pel model de diàleg. El projecte té l'objectiu d'estudiar com afecten aquests elements als models resultants i proposar una forma de filtrar aquests elements fora del conjunt de dades. En particular, creem dos conjunts de dades per entrenar un model de diàleg considerat estat de l'art. Un d'ells està esbiaixat per a contenir frases tòxiques (conjunt tòxic), mentre que l'altre està esbiaixat per a contenir frases no-tòxiques (conjunt no-tòxic). A continuació entrenem el model de diàleg amb cadascun dels conjunts de dades per separat. Després fem servir un escenari de "self-play" per a avaluar el comportament dels xatbots. Això és generar conversacions fent que el bot parli amb ell mateix per a després analitzar les conversacions. La nostra anàlisi mostra que la freqüència de frases tòxiques produïda pel bot tòxic (el bot entrenat amb el conjunt de dades tòxic) és quatre vegades superior a la freqüència de frases tòxiques produïdes pel bot no-tòxic. Això demostra empíricament que el comportament del xatbot pot ser modificat esbiaixant les dades d'entrenament. Chatbots have existed since many years ago, but the first iterations of this technology were just a set of logical constraints that answered questions in a scripted manner. More recently, the development of powerful hardware and the refinement of machine learning algorithms allowed its use in a multitude of fields. One of those fields is generative dialog systems. Generative dialog systems rely on big amounts of input data to create a model that emulates human behavior. The data used to train the dialog models can contain utterances of undesired behavior, and this undesired behavior can be inherited by the dialog model as a consequence. Our project aims to study how those utterances affect the behavior of the resulting models and to propose a way to filter those undesired behaviors from the dataset. In particular, we create two training datasets to train a state-of-the-art dialog model. One of the datasets is biased towards toxic sentences (toxic dataset), while the other is biased towards non-toxic sentences (non-toxic dataset). Then, we train the dialog model with both datasets separately. After that we use the self-play scenario to test the behavior of the chatbots. This means that we generate conversations making each bot to talk to itself and then we analyze the conversations. Our analysis show that the frequency of toxic sentences produced by the toxic bot (the bot trained with the toxic dataset) is four times higher than the frequency of toxic sentences produced by the non-toxic bot. This shows empirically that the behavior of the chatbot can be modulated by biasing the training data.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::42ff4ecbac986e025f7ea7562cb901a0 http://hdl.handle.net/2117/340515 Zobrazit plný text záznamu