Výsledky vyhledávání

Akademický článek

Concentration Influence of Complexing Agent on Electrodeposited Zn-Ni Alloy

Autor: Byung-Ki Son, Ji-Won Choi, Su-Byung Jeon, In-Joon Son

Publikováno v: Applied Sciences, Vol 13, Iss 13, p 7887 (2023)

Zinc (Zn) coatings, which are widely used to protect metals from corrosion, can be further improved by alloying with nickel (Ni). Increasing the Ni content enhances the corrosion-resistant properties of the Zn coating. This study investigated the eff

Externí odkaz: https://doaj.org/article/28aad7ff7318458791986a2f79582e2b

Zobrazit plný text záznamu

Report

The VoxCeleb Speaker Recognition Challenge: A Retrospective

Autor: Huh, Jaesung, Chung, Joon Son, Nagrani, Arsha, Brown, Andrew, Jung, Jee-weon, Garcia-Romero, Daniel, Zisserman, Andrew

The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and workshops that ran annually from 2019 to 2023. The challenges primarily evaluated the tasks of speaker recognition and diarisation under various settings including:

Externí odkaz: http://arxiv.org/abs/2408.14886

Zobrazit plný text záznamu

Report

Bridging the Gap between Audio and Text using Parallel-attention for User-defined Keyword Spotting

Autor: Kim, Youkyum, Jung, Jaemin, Park, Jihwan, Kim, Byeong-Yeol, Chung, Joon Son

This paper proposes a novel user-defined keyword spotting framework that accurately detects audio keywords based on text enrollment. Since audio data possesses additional acoustic information compared to text, there are discrepancies between these tw

Externí odkaz: http://arxiv.org/abs/2408.03593

Zobrazit plný text záznamu

Report

VoxSim: A perceptual voice similarity dataset

Autor: Ahn, Junseok, Kim, Youkyum, Choi, Yeunju, Kwak, Doyeop, Kim, Ji-Hoon, Mun, Seongkyu, Chung, Joon Son

This paper introduces VoxSim, a dataset of perceptual voice similarity ratings. Recent efforts to automate the assessment of speech synthesis technologies have primarily focused on predicting mean opinion score of naturalness, leaving speaker voice s

Externí odkaz: http://arxiv.org/abs/2407.18505

Zobrazit plný text záznamu

Report

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Autor: Senocak, Arda, Ryu, Hyeonggon, Kim, Junsik, Oh, Tae-Hyun, Pfister, Hanspeter, Chung, Joon Son

Recent studies on learning-based sound source localization have mainly focused on the localization performance perspective. However, prior work and existing benchmarks overlook a crucial aspect: cross-modal interaction, which is essential for interac

Externí odkaz: http://arxiv.org/abs/2407.13676

Zobrazit plný text záznamu

Report

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions

Autor: Feng, Jiu, Erol, Mehmet Hamza, Chung, Joon Son, Senocak, Arda

Transformers have rapidly overtaken CNN-based architectures as the new standard in audio classification. Transformer-based models, such as the Audio Spectrogram Transformers (AST), also inherit the fixed-size input paradigm from CNNs. However, this l

Externí odkaz: http://arxiv.org/abs/2407.08691

Zobrazit plný text záznamu

Report

Disentangled Representation Learning for Environment-agnostic Speaker Recognition

Autor: Nam, KiHyun, Heo, Hee-Soo, Jung, Jee-weon, Chung, Joon Son

This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into components re

Externí odkaz: http://arxiv.org/abs/2406.14559

Zobrazit plný text záznamu

Report

Lightweight Audio Segmentation for Long-form Speech Translation

Autor: Lee, Jaesong, Kim, Soyoon, Kim, Hanbyul, Chung, Joon Son

Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently,

Externí odkaz: http://arxiv.org/abs/2406.10549

Zobrazit plný text záznamu

Report

FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching

Autor: Jung, Chaeyoung, Lee, Suyeon, Kim, Ji-Hoon, Chung, Joon Son

This work proposes an efficient method to enhance the quality of corrupted speech signals by leveraging both acoustic and visual cues. While existing diffusion-based approaches have demonstrated remarkable quality, their applicability is limited by s

Externí odkaz: http://arxiv.org/abs/2406.09286

Zobrazit plný text záznamu

Report

To what extent can ASV systems naturally defend against spoofing attacks?

Autor: Jung, Jee-weon, Wang, Xin, Evans, Nicholas, Watanabe, Shinji, Shim, Hye-jin, Tak, Hemlata, Arora, Sidhhant, Yamagishi, Junichi, Chung, Joon Son

The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV sy

Externí odkaz: http://arxiv.org/abs/2406.05339

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání