Autor: |
Kharel, Aaditya, Paranjape, Manas, Bera, Aniket |
Rok vydání: |
2023 |
Předmět: |
|
Druh dokumentu: |
Working Paper |
Popis: |
With the rise in manipulated media, deepfake detection has become an imperative task for preserving the authenticity of digital content. In this paper, we present a novel multi-modal audio-video framework designed to concurrently process audio and video inputs for deepfake detection tasks. Our model capitalizes on lip synchronization with input audio through a cross-attention mechanism while extracting visual cues via a fine-tuned VGG-16 network. Subsequently, a transformer encoder network is employed to perform facial self-attention. We conduct multiple ablation studies highlighting different strengths of our approach. Our multi-modal methodology outperforms state-of-the-art multi-modal deepfake detection techniques in terms of F-1 and per-video AUC scores. |
Databáze: |
arXiv |
Externí odkaz: |
|