Behind Closed Words: Creating and Investigating the forePLay Annotated Dataset for Polish Erotic Discourse

Autor: Kołos, Anna, Lorenc, Katarzyna, Wiśnios, Emilia, Karlińska, Agnieszka
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: The surge in online content has created an urgent demand for robust detection systems, especially in non-English contexts where current tools demonstrate significant limitations. We present forePLay, a novel Polish language dataset for erotic content detection, featuring over 24k annotated sentences with a multidimensional taxonomy encompassing ambiguity, violence, and social unacceptability dimensions. Our comprehensive evaluation demonstrates that specialized Polish language models achieve superior performance compared to multilingual alternatives, with transformer-based architectures showing particular strength in handling imbalanced categories. The dataset and accompanying analysis establish essential frameworks for developing linguistically-aware content moderation systems, while highlighting critical considerations for extending such capabilities to morphologically complex languages.
Comment: The forePLay dataset and associated resources will be made publicly available for research purposes upon publication, in accordance with data sharing regulations
Databáze: arXiv