Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Kadhim, Hashiam"'
In this work we introduce NWT, an expressive speech-to-video model. Unlike approaches that use domain-specific intermediate representations such as pose keypoints, NWT learns its own latent representations, with minimal assumptions about the audio an
Externí odkaz:
http://arxiv.org/abs/2106.04283