Zobrazeno 1 - 10
of 488
pro vyhledávání: '"In-Joon Son"'
Publikováno v:
Applied Sciences, Vol 13, Iss 13, p 7887 (2023)
Zinc (Zn) coatings, which are widely used to protect metals from corrosion, can be further improved by alloying with nickel (Ni). Increasing the Ni content enhances the corrosion-resistant properties of the Zn coating. This study investigated the eff
Externí odkaz:
https://doaj.org/article/28aad7ff7318458791986a2f79582e2b
Autor:
Huh, Jaesung, Chung, Joon Son, Nagrani, Arsha, Brown, Andrew, Jung, Jee-weon, Garcia-Romero, Daniel, Zisserman, Andrew
The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and workshops that ran annually from 2019 to 2023. The challenges primarily evaluated the tasks of speaker recognition and diarisation under various settings including:
Externí odkaz:
http://arxiv.org/abs/2408.14886
This paper proposes a novel user-defined keyword spotting framework that accurately detects audio keywords based on text enrollment. Since audio data possesses additional acoustic information compared to text, there are discrepancies between these tw
Externí odkaz:
http://arxiv.org/abs/2408.03593
Autor:
Ahn, Junseok, Kim, Youkyum, Choi, Yeunju, Kwak, Doyeop, Kim, Ji-Hoon, Mun, Seongkyu, Chung, Joon Son
This paper introduces VoxSim, a dataset of perceptual voice similarity ratings. Recent efforts to automate the assessment of speech synthesis technologies have primarily focused on predicting mean opinion score of naturalness, leaving speaker voice s
Externí odkaz:
http://arxiv.org/abs/2407.18505
Autor:
Senocak, Arda, Ryu, Hyeonggon, Kim, Junsik, Oh, Tae-Hyun, Pfister, Hanspeter, Chung, Joon Son
Recent studies on learning-based sound source localization have mainly focused on the localization performance perspective. However, prior work and existing benchmarks overlook a crucial aspect: cross-modal interaction, which is essential for interac
Externí odkaz:
http://arxiv.org/abs/2407.13676
Transformers have rapidly overtaken CNN-based architectures as the new standard in audio classification. Transformer-based models, such as the Audio Spectrogram Transformers (AST), also inherit the fixed-size input paradigm from CNNs. However, this l
Externí odkaz:
http://arxiv.org/abs/2407.08691
This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into components re
Externí odkaz:
http://arxiv.org/abs/2406.14559
Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently,
Externí odkaz:
http://arxiv.org/abs/2406.10549
This work proposes an efficient method to enhance the quality of corrupted speech signals by leveraging both acoustic and visual cues. While existing diffusion-based approaches have demonstrated remarkable quality, their applicability is limited by s
Externí odkaz:
http://arxiv.org/abs/2406.09286
Autor:
Jung, Jee-weon, Wang, Xin, Evans, Nicholas, Watanabe, Shinji, Shim, Hye-jin, Tak, Hemlata, Arora, Sidhhant, Yamagishi, Junichi, Chung, Joon Son
The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV sy
Externí odkaz:
http://arxiv.org/abs/2406.05339