Zobrazeno 1 - 10
of 145
pro vyhledávání: '"Beskow, Jonas"'
In this paper, we present a novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR). Our primary objective is to extend the field of co-speech gesture generation by incorporating ri
Externí odkaz:
http://arxiv.org/abs/2410.00253
This paper focuses on enhancing human-agent communication by integrating spatial context into virtual agents' non-verbal behaviors, specifically gestures. Recent advances in co-speech gesture generation have primarily utilized data-driven methods, wh
Externí odkaz:
http://arxiv.org/abs/2408.04127
Autor:
Mehta, Shivam, Lameris, Harm, Punmiya, Rajiv, Beskow, Jonas, Székely, Éva, Henter, Gustav Eje
Converting input symbols to output audio in TTS requires modelling the durations of speech sounds. Leading non-autoregressive (NAR) TTS models treat duration modelling as a regression problem. The same utterance is then spoken with identical timings
Externí odkaz:
http://arxiv.org/abs/2406.05401
Autor:
Mehta, Shivam, Deichler, Anna, O'Regan, Jim, Moëll, Birger, Beskow, Jonas, Henter, Gustav Eje, Alexanderson, Simon
Although humans engaged in face-to-face conversation simultaneously communicate both verbally and non-verbally, methods for joint and unified synthesis of speech audio and co-speech 3D gesture motion from text are a new and emerging field. These tech
Externí odkaz:
http://arxiv.org/abs/2404.19622
Autor:
Mehta, Shivam, Tu, Ruibo, Alexanderson, Simon, Beskow, Jonas, Székely, Éva, Henter, Gustav Eje
As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures. This paper
Externí odkaz:
http://arxiv.org/abs/2310.05181
This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech a
Externí odkaz:
http://arxiv.org/abs/2309.05455
We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM). This yields an ODE-based decoder capable of high output quality in fewer synthesis step
Externí odkaz:
http://arxiv.org/abs/2309.03199
Autor:
Mehta, Shivam, Wang, Siyang, Alexanderson, Simon, Beskow, Jonas, Székely, Éva, Henter, Gustav Eje
With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here, co-speech
Externí odkaz:
http://arxiv.org/abs/2306.09417
Publikováno v:
ACM Trans. Graph. 42, 4 (August 2023), 20 pages
Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, e.g., dancing and co-sp
Externí odkaz:
http://arxiv.org/abs/2211.09707