Výsledky vyhledávání - "Huh, Jaesung"

Report

TIM: A Time Interval Machine for Audio-Visual Action Recognition

Autor: Chalk, Jacob, Huh, Jaesung, Kazakos, Evangelos, Zisserman, Andrew, Damen, Dima

Diverse actions give rise to rich audio-visual signals in long videos. Recent works showcase that the two modalities of audio and video exhibit different temporal extents of events and distinct labels. We address the interplay between the two modalit

Externí odkaz: http://arxiv.org/abs/2404.05559

Zobrazit plný text záznamu

Report

Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling

Autor: Korbar, Bruno, Huh, Jaesung, Zisserman, Andrew

The goal of this paper is automatic character-aware subtitle generation. Given a video and a minimal amount of metadata, we propose an audio-visual method that generates a full transcript of the dialogue, with precise speech timestamps, and the chara

Externí odkaz: http://arxiv.org/abs/2401.12039

Zobrazit plný text záznamu

Report

OxfordVGG Submission to the EGO4D AV Transcription Challenge

Autor: Huh, Jaesung, Bain, Max, Zisserman, Andrew

This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team. We present WhisperX, a system for efficient speech transcription of long-form audio with

Externí odkaz: http://arxiv.org/abs/2307.09006

Zobrazit plný text záznamu

Report

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Autor: Bain, Max, Huh, Jaesung, Han, Tengda, Zisserman, Andrew

Large-scale, weakly-supervised speech recognition models, such as Whisper, have demonstrated impressive results on speech recognition across domains and languages. However, their application to long audio transcription via buffered or sliding window

Externí odkaz: http://arxiv.org/abs/2303.00747

Zobrazit plný text záznamu

Report

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

Autor: Huh, Jaesung, Brown, Andrew, Jung, Jee-weon, Chung, Joon Son, Nagrani, Arsha, Garcia-Romero, Daniel, Zisserman, Andrew

This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022. The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems

Externí odkaz: http://arxiv.org/abs/2302.10248

Zobrazit plný text záznamu

Report

Epic-Sounds: A Large-scale Dataset of Actions That Sound

Autor: Huh, Jaesung, Chalk, Jacob, Kazakos, Evangelos, Damen, Dima, Zisserman, Andrew

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable aud

Externí odkaz: http://arxiv.org/abs/2302.00646

Zobrazit plný text záznamu

Report

Disentangled representation learning for multilingual speaker recognition

Autor: Nam, Kihyun, Kim, Youkyum, Huh, Jaesung, Heo, Hee Soo, Jung, Jee-weon, Chung, Joon Son

The goal of this paper is to learn robust speaker representation for bilingual speaking scenario. The majority of the world's population speak at least two languages; however, most speaker recognition systems fail to recognise the same speaker when s

Externí odkaz: http://arxiv.org/abs/2211.00437

Zobrazit plný text záznamu

Report

In search of strong embedding extractors for speaker diarisation

Autor: Jung, Jee-weon, Heo, Hee-Soo, Lee, Bong-Jin, Huh, Jaesung, Brown, Andrew, Kwon, Youngki, Watanabe, Shinji, Chung, Joon Son

Speaker embedding extractors (EEs), which map input audio to a speaker discriminant latent space, are of paramount importance in speaker diarisation. However, there are several challenges when adopting EEs for diarisation, from which we tackle two ke

Externí odkaz: http://arxiv.org/abs/2210.14682

Zobrazit plný text záznamu

Report

VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge

Autor: Brown, Andrew, Huh, Jaesung, Chung, Joon Son, Nagrani, Arsha, Garcia-Romero, Daniel, Zisserman, Andrew

The third instalment of the VoxCeleb Speaker Recognition Challenge was held in conjunction with Interspeech 2021. The aim of this challenge was to assess how well current speaker recognition technology is able to diarise and recognise speakers in unc

Externí odkaz: http://arxiv.org/abs/2201.04583

Zobrazit plný text záznamu

Report

With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

Autor: Kazakos, Evangelos, Huh, Jaesung, Nagrani, Arsha, Zisserman, Andrew, Damen, Dima

In egocentric videos, actions occur in quick succession. We capitalise on the action's temporal context and propose a method that learns to attend to surrounding actions in order to improve recognition performance. To incorporate the temporal context

Externí odkaz: http://arxiv.org/abs/2111.01024

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání