Výsledky vyhledávání

Report

Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility

Autor: Liu, Xiaoyu, Li, Xu, Serrà, Joan, Pascual, Santiago

Speech restoration aims at restoring full-band speech with high quality and intelligibility, considering a diverse set of distortions. MaskSR is a recently proposed generative model for this task. As other models of its kind, MaskSR attains high qual

Externí odkaz: http://arxiv.org/abs/2409.09357

Zobrazit plný text záznamu

Report

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Autor: Pascual, Santiago, Yeh, Chunghsin, Tsiamas, Ioannis, Serrà, Joan

Video-to-audio (V2A) generation leverages visual-only video features to render plausible sounds that match the scene. Importantly, the generated sound onsets should match the visual actions that are aligned with them, otherwise unnatural synchronizat

Externí odkaz: http://arxiv.org/abs/2407.10387

Zobrazit plný text záznamu

Report

Sequential Contrastive Audio-Visual Learning

Autor: Tsiamas, Ioannis, Pascual, Santiago, Yeh, Chunghsin, Serrà, Joan

Contrastive learning has emerged as a powerful technique in audio-visual representation learning, leveraging the natural co-occurrence of audio and visual modalities in extensive web-scale video datasets to achieve significant advancements. However,

Externí odkaz: http://arxiv.org/abs/2407.05782

Zobrazit plný text záznamu

Report

GASS: Generalizing Audio Source Separation with Large-scale Data

Autor: Pons, Jordi, Liu, Xiaoyu, Pascual, Santiago, Serrà, Joan

Universal source separation targets at separating the audio sources of an arbitrary mix, removing the constraint to operate on a specific domain like speech or music. Yet, the potential of universal source separation is limited because most existing

Externí odkaz: http://arxiv.org/abs/2310.00140

Zobrazit plný text záznamu

Kniha

Una república consciente. [elektronicky zdroj]

Autor: Ortiz i Serra, Joan

Externí odkaz: Kolekce e-knih KNAV (Registrovani uzivatele: plny text online 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on requests.)

Report

Mono-to-stereo through parametric stereo generation

Autor: Serrà, Joan, Scaini, Davide, Pascual, Santiago, Arteaga, Daniel, Pons, Jordi, Breebaart, Jeroen, Cengarle, Giulio

Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements. In this work, we propose to convert mono to s

Externí odkaz: http://arxiv.org/abs/2306.14647

Zobrazit plný text záznamu

Report

CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models

Autor: Dong, Hao-Wen, Liu, Xiaoyu, Pons, Jordi, Bhattacharya, Gautam, Pascual, Santiago, Serrà, Joan, Berg-Kirkpatrick, Taylor, McAuley, Julian

Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled

Externí odkaz: http://arxiv.org/abs/2306.09635

Zobrazit plný text záznamu

Kniha

Vi, Política I Espectacle : Procés de Patrimonialització de la Cultura de Vi a la DO Alella. [elektronicky zdroj]

Autor: Ribas Serra, Joan

Externí odkaz: Kolekce e-knih KNAV (Registrovani uzivatele: plny text online 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on request.)

Report

Full-band General Audio Synthesis with Score-based Diffusion

Autor: Pascual, Santiago, Bhattacharya, Gautam, Yeh, Chunghsin, Pons, Jordi, Serrà, Joan

Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds. Such models operate on band-limited signals and, as a result of

Externí odkaz: http://arxiv.org/abs/2210.14661

Zobrazit plný text záznamu

Report

Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation

Autor: Liu, Xiaoyu, Li, Xu, Serrà, Joan

Single channel target speaker separation (TSS) aims at extracting a speaker's voice from a mixture of multiple talkers given an enrollment utterance of that speaker. A typical deep learning TSS framework consists of an upstream model that obtains enr

Externí odkaz: http://arxiv.org/abs/2210.12635

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání