Zobrazeno 1 - 10
of 37
pro vyhledávání: '"Lauri Juvela"'
Publikováno v:
IEEE Access, Vol 7, Pp 17230-17246 (2019)
Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we aim to provide a general SSC system for converting styles with varying vocal effort and focus on normal-to-Lombard con
Externí odkaz:
https://doaj.org/article/75878375cd5f4fc79cd50856272b3c70
Publikováno v:
Applied Sciences, Vol 10, Iss 3, p 766 (2020)
This article investigates the use of deep neural networks for black-box modelling of audio distortion circuits, such as guitar amplifiers and distortion pedals. Both a feedforward network, based on the WaveNet model, and a recurrent neural network mo
Externí odkaz:
https://doaj.org/article/8b656def159c434093de564c9ff6c719
Autor:
Lauri Juvela, Eero-Pekka Damskägg, Aleksi Peussa, Jaakko Mäkinen, Thomas Sherson, Stylianos I. Mimilakis, Kimmo Rauhanen, Athanasios Gotsopoulos
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Autor:
Aleksi Peussa, Eero-Pekka Damskägg, Thomas Sherson, Stylianos Mimilakis, Lauri Juvela, Athanasios Gotsopoulos, Vesa Valimaki
Publikováno v:
Aalto University
Virtual analog (VA) modeling using neural networks (NNs) has great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due to their connection with discrete nodal analysis. Furthermor
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 27:1019-1030
Recently, generative neural network models which operate directly on raw audio, such as WaveNet, have improved the state of the art in text-to-speech synthesis (TTS). Moreover, there is increasing interest in using these models as statistical vocoder
Publikováno v:
Bollepalli, B, Juvela, L, Airaksinen, M, Valentini Botinhao, C & Alku, P 2019, ' Normal-to-Lombard Adaptation of Speech Synthesis Using Long Short-Term Memory Recurrent Neural Networks ', Speech Communication, vol. 110, pp. 64-75 . https://doi.org/10.1016/j.specom.2019.04.008
In this article, three adaptation methods are compared based on how well they change the speaking style of a neural network based text-to-speech (TTS) voice. The speaking style conversion adopted here is from normal to Lombard speech. The selected ad
Publikováno v:
IEEE Access, Vol 7, Pp 17230-17246 (2019)
Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we aim to provide a general SSC system for converting styles with varying vocal effort and focus on normal-to-Lombard con
Autor:
Hirokazu Kameoka, Hsin-Te Hwang, Driss Matrouf, Markus Becker, Quan Wang, Sahidullah, Ye Jia, Yu Zhang, Lauri Juvela, Hsin-Min Wang, Wen-Chin Huang, Zhen-Hua Ling, Yuan Jiang, Yi-Chiao Wu, Héctor Delgado, Massimiliano Todisco, Yu Tsao, Li-Juan Liu, Junichi Yamagishi, Jean-François Bonastre, Tomoki Toda, Nicholas Evans, Robert A. J. Clark, Kai Onuma, Yu-Huai Peng, Sébastien Le Maguer, Avashna Govender, Takashi Kaneda, Andreas Nautsch, Kong Aik Lee, Xin Wang, Srikanth Ronanki, Ville Vestman, Koji Mushika, Ingmar Steiner, Tomi Kinnunen, Fergus Henderson, Jing-Xuan Zhang, Kou Tanaka, Paavo Alku
Publikováno v:
Computer Speech and Language
Computer Speech and Language, 2020, 64, pp.101114. ⟨10.1016/j.csl.2020.101114⟩
Wang, X, Yamagishi, J, Todisco, M, Delgado, H, Nautsch, A, Evans, N, Sahidullah, M, Vestman, V, Kinnunen, T, Lee, K A, Juvela, L, Alku, P, Peng, Y-H, Hwang, H-T, Tsao, Y, Wang, H-M, Maguer, S L, Becker, M, Henderson, F, Clark, R, Zhang, Y, Wang, Q, Jia, Y, Onuma, K, Mushika, K, Kaneda, T, Jiang, Y, Liu, L-J, Wu, Y-C, Huang, W-C, Toda, T, Tanaka, K, Kameoka, H, Steiner, I, Matrouf, D, Bonastre, J-F, Govender, A, Ronanki, S, Zhang, J-X & Ling, Z-H 2020, ' ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech ', Computer Speech and Language, vol. 64, 101114 . https://doi.org/10.1016/j.csl.2020.101114
Computer Speech and Language, Elsevier, 2020, 64, pp.101114. ⟨10.1016/j.csl.2020.101114⟩
Computer Speech and Language, 2020, 64, pp.101114. ⟨10.1016/j.csl.2020.101114⟩
Wang, X, Yamagishi, J, Todisco, M, Delgado, H, Nautsch, A, Evans, N, Sahidullah, M, Vestman, V, Kinnunen, T, Lee, K A, Juvela, L, Alku, P, Peng, Y-H, Hwang, H-T, Tsao, Y, Wang, H-M, Maguer, S L, Becker, M, Henderson, F, Clark, R, Zhang, Y, Wang, Q, Jia, Y, Onuma, K, Mushika, K, Kaneda, T, Jiang, Y, Liu, L-J, Wu, Y-C, Huang, W-C, Toda, T, Tanaka, K, Kameoka, H, Steiner, I, Matrouf, D, Bonastre, J-F, Govender, A, Ronanki, S, Zhang, J-X & Ling, Z-H 2020, ' ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech ', Computer Speech and Language, vol. 64, 101114 . https://doi.org/10.1016/j.csl.2020.101114
Computer Speech and Language, Elsevier, 2020, 64, pp.101114. ⟨10.1016/j.csl.2020.101114⟩
Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as "presentation attacks." The
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::008166e7f7bcf01a98428755ccc18525
https://hal.science/hal-02945493/document
https://hal.science/hal-02945493/document
Publikováno v:
Applied Sciences
Volume 10
Issue 3
Applied Sciences, Vol 10, Iss 3, p 766 (2020)
Volume 10
Issue 3
Applied Sciences, Vol 10, Iss 3, p 766 (2020)
This article investigates the use of deep neural networks for black-box modelling of audio distortion circuits, such as guitar amplifiers and distortion pedals. Both a feedforward network, based on the WaveNet model, and a recurrent neural network mo
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6d43a324d3df6affdd5a185a28b88e08
https://aaltodoc.aalto.fi/handle/123456789/43739
https://aaltodoc.aalto.fi/handle/123456789/43739
Publikováno v:
INTERSPEECH
This paper adapts a StyleGAN model for speech generation with minimal or no conditioning on text. StyleGAN is a multi-scale convolutional GAN capable of hierarchically capturing data structure and latent variation on multiple spatial (or temporal) le
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::aaec026025f9843cb813c2d67e43df21