Výsledky vyhledávání

Report

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation

Autor: Prajwal, K R, Shi, Bowen, Lee, Matthew, Vyas, Apoorv, Tjandra, Andros, Luthra, Mahi, Guo, Baishan, Wang, Huiyu, Afouras, Triantafyllos, Kant, David, Hsu, Wei-Ning

We introduce MusicFlow, a cascaded text-to-music generation model based on flow matching. Based on self-supervised representations to bridge between text descriptions and music audios, we construct two flow matching networks to model the conditional

Externí odkaz: http://arxiv.org/abs/2410.20478

Zobrazit plný text záznamu

Report

A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

Autor: Raude, Charles, Prajwal, K R, Momeni, Liliane, Bull, Hannah, Albanie, Samuel, Zisserman, Andrew, Varol, Gül

In this work, our goals are two fold: large-vocabulary continuous sign language recognition (CSLR), and sign language retrieval. To this end, we introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in

Externí odkaz: http://arxiv.org/abs/2405.10266

Zobrazit plný text záznamu

Akademický článek

High‐Frequency Gravity Waves and Kelvin‐Helmholtz Billows in the Tropical UTLS, as Seen From Radar Observations of Vertical Wind

Autor: Ajil Kottayil, Aurélien Podglajen, Bernard Legras, Rachel Atlas, Prajwal K, K. Satheesan, Abhilash S

Publikováno v: Geophysical Research Letters, Vol 51, Iss 21, Pp n/a-n/a (2024)

Abstract The present study analyzes novel observations of vertical wind (w) in the tropical upper troposphere‐lower stratosphere obtained from a radar wind profiler in Cochin, India. Between December 2022 and April 2023, 63 consecutive 4 hr curtain

Externí odkaz: https://doaj.org/article/606c1eaf65c0468c9d4b226142c9f20a

Zobrazit plný text záznamu

Report

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

Autor: Prajwal, K R, Bull, Hannah, Momeni, Liliane, Albanie, Samuel, Varol, Gül, Zisserman, Andrew

The goal of this work is to detect and recognize sequences of letters signed using fingerspelling in British Sign Language (BSL). Previous fingerspelling recognition methods have not focused on BSL, which has a very different signing alphabet (e.g.,

Externí odkaz: http://arxiv.org/abs/2211.08954

Zobrazit plný text záznamu

Report

Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild

Autor: Hegde, Sindhu B, Prajwal, K R, Mukhopadhyay, Rudrabha, Namboodiri, Vinay P, Jawahar, C. V.

In this work, we address the problem of generating speech from silent lip videos for any speaker in the wild. In stark contrast to previous works, our method (i) is not restricted to a fixed number of speakers, (ii) does not explicitly impose constra

Externí odkaz: http://arxiv.org/abs/2209.00642

Zobrazit plný text záznamu

Report

Automatic dense annotation of large-vocabulary sign language videos

Autor: Momeni, Liliane, Bull, Hannah, Prajwal, K R, Albanie, Samuel, Varol, Gül, Zisserman, Andrew

Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training

Externí odkaz: http://arxiv.org/abs/2208.02802

Zobrazit plný text záznamu

Report

Visual Keyword Spotting with Attention

Autor: Prajwal, K R, Momeni, Liliane, Afouras, Triantafyllos, Zisserman, Andrew

In this paper, we consider the task of spotting spoken keywords in silent video sequences -- also known as visual keyword spotting. To this end, we investigate Transformer-based models that ingest two streams, a visual encoding of the video and a pho

Externí odkaz: http://arxiv.org/abs/2110.15957

Zobrazit plný text záznamu

Report

Sub-word Level Lip Reading With Visual Attention

Autor: Prajwal, K R, Afouras, Triantafyllos, Zisserman, Andrew

The goal of this paper is to learn strong lip reading models that can recognise speech in silent videos. Most prior works deal with the open-set visual speech recognition problem by adapting existing automatic speech recognition techniques on top of

Externí odkaz: http://arxiv.org/abs/2110.07603

Zobrazit plný text záznamu

Report

Visual Speech Enhancement Without A Real Visual Stream

Autor: Hegde, Sindhu B, Prajwal, K R, Mukhopadhyay, Rudrabha, Namboodiri, Vinay, Jawahar, C. V.

In this work, we re-think the task of speech enhancement in unconstrained real-world environments. Current state-of-the-art methods use only the audio stream and are limited in their performance in a wide range of real-world noises. Recent works usin

Externí odkaz: http://arxiv.org/abs/2012.10852

Zobrazit plný text záznamu

Report

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

Autor: Prajwal, K R, Mukhopadhyay, Rudrabha, Namboodiri, Vinay, Jawahar, C V

In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment. Current works excel at producing accurate lip movements on a static image or videos of specific people seen during

Externí odkaz: http://arxiv.org/abs/2008.10010

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání