Consumption-based approaches in proactive detection for content moderation

Autor:	Shahar Elisha, John N. Pougué-Biyong, Mariano Beguerisse-Díaz
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Content moderation Proactive detection Consumption networks Label propagation Node ranking Collaborative filtering Computer applications to medicine. Medical informatics R858-859.7
Zdroj:	EPJ Data Science, Vol 13, Iss 1, Pp 1-21 (2024)
Druh dokumentu:	article
ISSN:	2193-1127
DOI:	10.1140/epjds/s13688-024-00505-x
Popis:	Abstract Implementing effective content moderation systems at scale is an unavoidable and complex challenge facing technology platforms. Developing systems that automate detection and removal of violative content is fraught with performance, safety and fairness considerations that make their implementation challenging. In particular, content-based systems require large amounts of data to train, cannot be easily transferred between contexts, and are susceptible to data drift. For these reasons, platforms employ a wide range of content classification models and rely heavily on human moderation, which can be prohibitively expensive to implement at scale. To address some of these challenges, we developed a framework that relies on consumption patterns to find high-quality leads for human reviewers to assess. This framework leverages consumption networks, and ranks candidate items for review using two techniques: Mean Percentile Ranking (MPR), which we have developed, and an adaptation of Label Propagation (LP). We demonstrate the effectiveness of this approach to find violative material in production settings using professional reviewers, and on a publicly available dataset from MovieLens. We compare our results with a popular collaborative filtering (CF) baseline, and we show that our approach outperforms CF in production settings. Then, we explore how performance can improve using Active Learning techniques. The key advantage of our approach is that it does not require any content-based data; it is able to find both low- and high-consumption items, and is easily scalable and cost effective to run.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/fb0d103b22df439686614a867ce669a9 Zobrazit plný text záznamu Full text from SpringerLink View record in DOAJ