Výsledky vyhledávání

Report

Disentangled Representation Learning for Environment-agnostic Speaker Recognition

Autor: Nam, KiHyun, Heo, Hee-Soo, Jung, Jee-weon, Chung, Joon Son

This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into components re

Externí odkaz: http://arxiv.org/abs/2406.14559

Zobrazit plný text záznamu

Report

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

Autor: Heo, Hee-Soo, Nam, KiHyun, Lee, Bong-Jin, Kwon, Youngki, Lee, Minjae, Kim, You Jin, Chung, Joon Son

In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embe

Externí odkaz: http://arxiv.org/abs/2309.14741

Zobrazit plný text záznamu

Report

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

Autor: Jung, Chaeyoung, Lee, Suyeon, Nam, Kihyun, Rho, Kyeongha, Kim, You Jin, Jang, Youngjoon, Chung, Joon Son

The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective represen

Externí odkaz: http://arxiv.org/abs/2309.12306

Zobrazit plný text záznamu

Report

Disentangled representation learning for multilingual speaker recognition

Autor: Nam, Kihyun, Kim, Youkyum, Huh, Jaesung, Heo, Hee Soo, Jung, Jee-weon, Chung, Joon Son

The goal of this paper is to learn robust speaker representation for bilingual speaking scenario. The majority of the world's population speak at least two languages; however, most speaker recognition systems fail to recognise the same speaker when s

Externí odkaz: http://arxiv.org/abs/2211.00437

Zobrazit plný text záznamu

Report

ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

Autor: Ha, Jung-Woo, Nam, Kihyun, Kang, Jingu, Lee, Sang-Woo, Yang, Sohee, Jung, Hyunhoon, Kim, Eunmi, Kim, Hyeji, Kim, Soojin, Kim, Hyun Ah, Doh, Kyoungtae, Lee, Chan Kyu, Sung, Nako, Kim, Sunghun

Automatic speech recognition (ASR) via call is essential for various applications, including AI for contact center (AICC) services. Despite the advancement of ASR, however, most publicly available call-based speech corpora such as Switchboard are old

Externí odkaz: http://arxiv.org/abs/2004.09367

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání