Zobrazeno 1 - 10
of 545
pro vyhledávání: '"Takahashi, Naoya"'
This paper presents a novel approach to deter unauthorized deepfakes and enable user tracking in generative models, even when the user has full access to the model parameters, by integrating key-based model authentication with watermarking techniques
Externí odkaz:
http://arxiv.org/abs/2409.07743
In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and ro
Externí odkaz:
http://arxiv.org/abs/2406.03822
Autor:
Shimada, Kazuki, Politis, Archontis, Sudarsanam, Parthasaarathy, Krause, Daniel, Uchida, Kengo, Adavanne, Sharath, Hakala, Aapo, Koyama, Yuichiro, Takahashi, Naoya, Takahashi, Shusuke, Virtanen, Tuomas, Mitsufuji, Yuki
While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of
Externí odkaz:
http://arxiv.org/abs/2306.09126
Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples. However, for the low-data resource domains, training a high-quality ASR remains
Externí odkaz:
http://arxiv.org/abs/2305.15055
Publikováno v:
EURASIP Journal on Audio, Speech, and Music Processing (JASM), 39 (2024)
This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL)
Externí odkaz:
http://arxiv.org/abs/2305.07855
Image-to-image translation and voice conversion enable the generation of a new facial image and voice while maintaining some of the semantics such as a pose in an image and linguistic content in audio, respectively. They can aid in the content-creati
Externí odkaz:
http://arxiv.org/abs/2302.13838
Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches convert emot
Externí odkaz:
http://arxiv.org/abs/2302.10536
Recent years have seen progress beyond domain-specific sound separation for speech or music towards universal sound separation for arbitrary sounds. Prior work on universal sound separation has investigated separating a target sound out of an audio m
Externí odkaz:
http://arxiv.org/abs/2212.07065
Recent progress in deep generative models has improved the quality of voice conversion in the speech domain. However, high-quality singing voice conversion (SVC) of unseen singers remains challenging due to the wider variety of musical expressions in
Externí odkaz:
http://arxiv.org/abs/2210.11096
Recent progress in deep generative models has improved the quality of neural vocoders in speech domain. However, generating a high-quality singing voice remains challenging due to a wider variety of musical expressions in pitch, loudness, and pronunc
Externí odkaz:
http://arxiv.org/abs/2210.07508