A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements

Autor: Lucia Summers, Alyssa N. Shallenberger, John Cruz, Lawrence V. Fulton
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Machine Learning and Knowledge Extraction, Vol 5, Iss 2, Pp 460-472 (2023)
Druh dokumentu: article
ISSN: 2504-4990
DOI: 10.3390/make5020028
Popis: Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (>0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.
Databáze: Directory of Open Access Journals