3D CNN Architectures and Attention Mechanisms for Deepfake Detection

Autor:	Ritaban Roy, Indu Joshi, Abhijit Das, Antitza Dantcheva
Přispěvatelé:	Birla Institute of Technology and Science (BITS Pilani), Spatio-Temporal Activity Recognition Systems (STARS), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Indian Institute of Technology Delhi (IIT Delhi), Thapar University, Springer International Publishing
Rok vydání:	2022
Předmět:	[INFO]Computer Science [cs]
Zdroj:	Handbook of Digital Face Manipulation and Detection ISBN: 9783030876630 Handbook of Digital Face Manipulation and Detection : From DeepFakes to Morphing Attacks Springer International Publishing. Handbook of Digital Face Manipulation and Detection : From DeepFakes to Morphing Attacks, 2022, Advances in Computer Vision and Pattern Recognition. ACVPR, 978-3-030-87666-1. ⟨10.1007/978-3-030-87664-7_10⟩
DOI:	10.1007/978-3-030-87664-7_10
Popis:	Manipulated images and videos have become increasingly realistic due to the tremendous progress of deep convolutional neural networks (CNNs). While technically intriguing, such progress raises a number of social concerns related to the advent and spread of fake information and fake news. Such concerns necessitate the introduction of robust and reliable methods for fake image and video detection. Toward this in this work, we study the ability of state-of-the-art video CNNs including 3D ResNet, 3D ResNeXt, and I3D in detecting manipulated videos. In addition, and toward a more robust detection, we investigate the effectiveness of attention mechanisms in this context. Such mechanisms are introduced in CNN architectures in order to ensure that robust features are being learnt. We test two attention mechanisms, namely SE-block and Non-local networks. We present related experimental results on videos tampered by four manipulation techniques, as included in the FaceForensics++ dataset. We investigate three scenarios, where the networks are trained to detect (a) all manipulated videos, (b) each manipulation technique individually, as well as (c) the veracity of videos pertaining to manipulation techniques not included in the train set.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::93fa6fa093b29ecee20f6c673b213911 https://doi.org/10.1007/978-3-030-87664-7_10 Zobrazit plný text záznamu