Výsledky vyhledávání

Report

Easy, Interpretable, Effective: openSMILE for voice deepfake detection

Autor: Pascu, Octavian, Oneata, Dan, Cucu, Horia, Müller, Nicolas M.

In this paper, we demonstrate that attacks in the latest ASVspoof5 dataset -- a de facto standard in the field of voice authenticity and deepfake detection -- can be identified with surprising accuracy using a small subset of very simplistic features

Externí odkaz: http://arxiv.org/abs/2408.15775

Zobrazit plný text záznamu

Report

WavLM model ensemble for audio deepfake detection

Autor: Combei, David, Stan, Adriana, Oneata, Dan, Cucu, Horia

Audio deepfake detection has become a pivotal task over the last couple of years, as many recent speech synthesis and voice cloning systems generate highly realistic speech samples, thus enabling their use in malicious activities. In this paper we ad

Externí odkaz: http://arxiv.org/abs/2408.07414

Zobrazit plný text záznamu

Report

Translating speech with just images

Autor: Oneata, Dan, Kamper, Herman

Visually grounded speech models link speech to images. We extend this connection by linking images to text via an existing image captioning system, and as a result gain the ability to map speech audio directly to text. This approach can be used for s

Externí odkaz: http://arxiv.org/abs/2406.07133

Zobrazit plný text záznamu

Report

Visually Grounded Speech Models have a Mutual Exclusivity Bias

Autor: Nortje, Leanne, Oneaţă, Dan, Matusevych, Yevgen, Kamper, Herman

When children learn new words, they employ constraints such as the mutual exclusivity (ME) bias: a novel word is mapped to a novel object rather than a familiar one. This bias has been studied computationally, but only in models that use discrete wor

Externí odkaz: http://arxiv.org/abs/2403.13922

Zobrazit plný text záznamu

Report

Weakly-supervised deepfake localization in diffusion-generated images

Autor: Tantaru, Dragos, Oneata, Elisabeta, Oneata, Dan

The remarkable generative capabilities of denoising diffusion models have raised new concerns regarding the authenticity of the images we see every day on the Internet. However, the vast majority of existing deepfake detection models are tested again

Externí odkaz: http://arxiv.org/abs/2311.04584

Zobrazit plný text záznamu

Report

Towards generalisable and calibrated synthetic speech detection with self-supervised representations

Autor: Pascu, Octavian, Stan, Adriana, Oneata, Dan, Oneata, Elisabeta, Cucu, Horia

Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deepfake detectors. However, recent studies have shown that the current audio deepfake models fall short of this desideratum. In this work we

Externí odkaz: http://arxiv.org/abs/2309.05384

Zobrazit plný text záznamu

Report

Visually grounded few-shot word learning in low-resource settings

Autor: Nortje, Leanne, Oneata, Dan, Kamper, Herman

We propose a visually grounded speech model that learns new words and their visual depictions from just a few word-image example pairs. Given a set of test images and a spoken query, we ask the model which image depicts the query word. Previous work

Externí odkaz: http://arxiv.org/abs/2306.11371

Zobrazit plný text záznamu

Report

Multilingual Multimodal Learning with Machine Translated Text

Autor: Qiu, Chen, Oneata, Dan, Bugliarello, Emanuele, Frank, Stella, Elliott, Desmond

Most vision-and-language pretraining research focuses on English tasks. However, the creation of multilingual multimodal evaluation datasets (e.g. Multi30K, xGQA, XVNLI, and MaRVL) poses a new challenge in finding high-quality training data that is b

Externí odkaz: http://arxiv.org/abs/2210.13134

Zobrazit plný text záznamu

Report

YFACC: A Yor\`ub\'a speech-image dataset for cross-lingual keyword localisation through visual grounding

Autor: Olaleye, Kayode, Oneata, Dan, Kamper, Herman

Visually grounded speech (VGS) models are trained on images paired with unlabelled spoken captions. Such models could be used to build speech systems in settings where it is impossible to get labelled data, e.g. for documenting unwritten languages. H

Externí odkaz: http://arxiv.org/abs/2210.04600

Zobrazit plný text záznamu

Report

FlexLip: A Controllable Text-to-Lip System

Autor: Oneata, Dan, Lorincz, Beata, Stan, Adriana, Cucu, Horia

Publikováno v: Sensors. 2022; 22(11):4104

The task of converting text input into video content is becoming an important topic for synthetic media generation. Several methods have been proposed with some of them reaching close-to-natural performances in constrained tasks. In this paper, we ta

Externí odkaz: http://arxiv.org/abs/2206.03206

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání