Výsledky vyhledávání

Report

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Autor: Pascual, Santiago, Yeh, Chunghsin, Tsiamas, Ioannis, Serrà, Joan

Video-to-audio (V2A) generation leverages visual-only video features to render plausible sounds that match the scene. Importantly, the generated sound onsets should match the visual actions that are aligned with them, otherwise unnatural synchronizat

Externí odkaz: http://arxiv.org/abs/2407.10387

Zobrazit plný text záznamu

Report

Sequential Contrastive Audio-Visual Learning

Autor: Tsiamas, Ioannis, Pascual, Santiago, Yeh, Chunghsin, Serrà, Joan

Contrastive learning has emerged as a powerful technique in audio-visual representation learning, leveraging the natural co-occurrence of audio and visual modalities in extensive web-scale video datasets to achieve significant advancements. However,

Externí odkaz: http://arxiv.org/abs/2407.05782

Zobrazit plný text záznamu

Report

Pushing the Limits of Zero-shot End-to-End Speech Translation

Autor: Tsiamas, Ioannis, Gállego, Gerard I., Fonollosa, José A. R., Costa-jussà, Marta R.

Data scarcity and the modality gap between the speech and text modalities are two major obstacles of end-to-end Speech Translation (ST) systems, thus hindering their performance. Prior work has attempted to mitigate these challenges by leveraging ext

Externí odkaz: http://arxiv.org/abs/2402.10422

Zobrazit plný text záznamu

Report

Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23

Autor: Tsiamas, Ioannis, Gállego, Gerard I., Fonollosa, José A. R., Costa-jussà, Marta R.

This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task. Our Speech Translation systems utilize foundation models for speech (wav2vec 2.0) and text (mBART50). We incorporate a Siamese

Externí odkaz: http://arxiv.org/abs/2306.01327

Zobrazit plný text záznamu

Report

Explaining How Transformers Use Context to Build Predictions

Autor: Ferrando, Javier, Gállego, Gerard I., Tsiamas, Ioannis, Costa-jussà, Marta R.

Language Generation Models produce words based on the previous context. Although existing methods offer input attributions as explanations for a model's prediction, it is still unclear how prior words affect the model's decision throughout the layers

Externí odkaz: http://arxiv.org/abs/2305.12535

Zobrazit plný text záznamu

Report

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Autor: Tsiamas, Ioannis, Fonollosa, José A. R., Costa-jussà, Marta R.

End-to-end Speech Translation is hindered by a lack of available data resources. While most of them are based on documents, a sentence-level version is available, which is however single and static, potentially impeding the usefulness of the data. We

Externí odkaz: http://arxiv.org/abs/2212.09699

Zobrazit plný text záznamu

Report

Efficient Speech Translation with Dynamic Latent Perceivers

Autor: Tsiamas, Ioannis, Gállego, Gerard I., Fonollosa, José A. R., Costa-jussà, Marta R.

Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality. Since speech signals are longer than their textual counterparts, and due to the quadratic complexity o

Externí odkaz: http://arxiv.org/abs/2210.16264

Zobrazit plný text záznamu

Report

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Autor: Tsiamas, Ioannis, Gállego, Gerard I., Fonollosa, José A. R., Costa-jussà, Marta R.

Speech translation models are unable to directly process long audios, like TED talks, which have to be split into shorter segments. Speech translation datasets provide manual segmentations of the audios, which are not available in real-world scenario

Externí odkaz: http://arxiv.org/abs/2202.04774

Zobrazit plný text záznamu

Report

End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021

Autor: Gállego, Gerard I., Tsiamas, Ioannis, Escolano, Carlos, Fonollosa, José A. R., Costa-jussà, Marta R.

This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German te

Externí odkaz: http://arxiv.org/abs/2105.04512

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání