Graph embeddings for Abusive Language Detection

Autor: Noé Cecillon, Georges Linarès, Vincent Labatut, Richard Dufour
Přispěvatelé: Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Rok vydání: 2021
Předmět:
FOS: Computer and information sciences
Theoretical computer science
General Computer Science
Language identification
Computer Networks and Communications
Computer science
Process (engineering)
Graph embedding
Automatic abuse detection
02 engineering and technology
Social networks
[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI]
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Conversational graph
Artificial Intelligence
020204 information systems
0202 electrical engineering
electronic engineering
information engineering

Set (psychology)
Representation (mathematics)
Structure (mathematical logic)
Social and Information Networks (cs.SI)
Online conversations
Node (networking)
Computer Science - Social and Information Networks
Computer Graphics and Computer-Aided Design
Graph
Computer Science Applications
Computational Theory and Mathematics
020201 artificial intelligence & image processing
Zdroj: SN Computer Science
SN Computer Science, Springer, 2021, 2, pp.37. ⟨10.1007/s42979-020-00413-7⟩
ISSN: 2662-995X
2661-8907
DOI: 10.48550/arxiv.2101.02988
Popis: International audience; Abusive behaviors are common on online social networks. The increasing frequency of antisocial behaviors forces the hosts of online platforms to find new solutions to address this problem. Automating the moderation process has thus received a lot of interest in the past few years. Various methods have been proposed, most based on the exchanged content, and one relying on the structure and dynamics of the conversation. It has the advantage of being languageindependent, however it leverages a hand-crafted set of topological measures which are computationally expensive and not necessarily suitable to all situations. In the present paper, we propose to use recent graph embedding approaches to automatically learn representations of conversational graphs depicting message exchanges. We compare two categories: node vs. whole-graph embeddings. We experiment with a total of 8 approaches and apply them to a dataset of online messages. We also study more precisely which aspects of the graph structure are leveraged by each approach. Our study shows that the representation produced by certain embeddings captures the information conveyed by specific topological measures, but misses out other aspects.
Databáze: OpenAIRE