DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention

Autor:	Kharel, Aaditya, Paranjape, Manas, Bera, Aniket
Rok vydání:	2023
Předmět:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia
Druh dokumentu:	Working Paper
Popis:	With the rise in manipulated media, deepfake detection has become an imperative task for preserving the authenticity of digital content. In this paper, we present a novel multi-modal audio-video framework designed to concurrently process audio and video inputs for deepfake detection tasks. Our model capitalizes on lip synchronization with input audio through a cross-attention mechanism while extracting visual cues via a fine-tuned VGG-16 network. Subsequently, a transformer encoder network is employed to perform facial self-attention. We conduct multiple ablation studies highlighting different strengths of our approach. Our multi-modal methodology outperforms state-of-the-art multi-modal deepfake detection techniques in terms of F-1 and per-video AUC scores.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2309.06511 Zobrazit plný text záznamu View this record from Arxiv