Zobrazeno 1 - 10
of 313
pro vyhledávání: '"Masato Akagi"'
Publikováno v:
IEEE Access, Vol 11, Pp 84689-84698 (2023)
Fake audio detection (FAD) aims to identify fraudulent speech generated through advanced speech-synthesis techniques. Most current FAD methods rely solely on a deep neural network (DNN) framework with either speech waveforms or commonly used acoustic
Externí odkaz:
https://doaj.org/article/5fbbd8ddd4724a6b93b544bfe2ffd3ca
Publikováno v:
IEEE Access, Vol 11, Pp 141573-141584 (2023)
In speech enhancement, accurate phase reconstruction can significantly improve speech quality. While phase-aware speech enhancement methods using the complex ideal ratio mask (cIRM) have shown promise, the estimation difficulty of the phase is shared
Externí odkaz:
https://doaj.org/article/eaaa6b59af1947aa8efd515f9e552272
Publikováno v:
IEEE Access, Vol 10, Pp 72381-72387 (2022)
This paper evaluates speech emotion and naturalness recognitions by utilizing deep learning models with multitask learning and single-task learning approaches. The emotion model accommodates valence, arousal, and dominance attributes known as dimensi
Externí odkaz:
https://doaj.org/article/4451b84aa295423cbd7b66aa5c761af0
Publikováno v:
Applied Sciences, Vol 13, Iss 10, p 6239 (2023)
We previously investigated the perception of noise-vocoded speech to determine whether the temporal amplitude envelope (TAE) of speech plays an important role in the perception of linguistic information as well as non-linguistic information. However,
Externí odkaz:
https://doaj.org/article/f69c317018dd4a3c9656344d884ed1ca
Autor:
Tuan Vu Ho, Masato Akagi
Publikováno v:
IEEE Access, Vol 9, Pp 47503-47515 (2021)
This paper proposes a non-parallel cross-lingual voice conversion (CLVC) model that can mimic voice while continuously controlling speaker individuality on the basis of the variational autoencoder (VAE) and star generative adversarial network (StarGA
Externí odkaz:
https://doaj.org/article/616d41aac8b14b64afbe32601edc74b3
Publikováno v:
International Journal of Informatics, Information System and Computer Engineering, Vol 1, Iss 1, Pp 91-102 (2020)
Emotion can be inferred from tonal and verbal information, where both features can be extracted from speech. While most researchers conducted studies on categorical emotion recognition from a single modality, this research presents a dimensional emot
Externí odkaz:
https://doaj.org/article/3e7e9b35779d48d9835e24e8f77a81f0
Publikováno v:
IEEE Access, Vol 8, Pp 16560-16572 (2020)
Emotion information from speech can effectively help robots understand speaker's intentions in natural human-robot interaction. The human auditory system can easily track temporal dynamics of emotion by perceiving the intensity and fundamental freque
Externí odkaz:
https://doaj.org/article/0f13d0eb0e6849aabbd453feb2e6de0f
Publikováno v:
Speech Communication. 140:11-28
Publikováno v:
Speech Communication. 139:22-34
Publikováno v:
Speech Communication. 135:11-24
This study focuses on identifying effective features for controlling speech to increase speech intelligibility under adverse conditions. Previous approaches either cancel noise throughout speech presentation or preprocess speech by controlling its in