paGAN
Autor: | Aviral Agarwal, Jaewoo Seo, Shunsuke Saito, Lingyu Wei, Jun Xing, Koki Nagano, Zimo Li, Jens Fursund, Hao Li |
---|---|
Rok vydání: | 2018 |
Předmět: |
business.industry
Computer science ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 020207 software engineering 02 engineering and technology Image-based modeling and rendering Computer Graphics and Computer-Aided Design Expression (mathematics) Image (mathematics) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Computer vision Artificial intelligence business Set (psychology) Mobile device Computer facial animation ComputingMethodologies_COMPUTERGRAPHICS Texture synthesis |
Zdroj: | ACM Transactions on Graphics. 37:1-12 |
ISSN: | 1557-7368 0730-0301 |
DOI: | 10.1145/3272127.3275075 |
Popis: | With the rising interest in personalized VR and gaming experiences comes the need to create high quality 3D avatars that are both low-cost and variegated. Due to this, building dynamic avatars from a single unconstrained input image is becoming a popular application. While previous techniques that attempt this require multiple input images or rely on transferring dynamic facial appearance from a source actor, we are able to do so using only one 2D input image without any form of transfer from a source image. We achieve this using a new conditional Generative Adversarial Network design that allows fine-scale manipulation of any facial input image into a new expression while preserving its identity. Our photoreal avatar GAN (paGAN) can also synthesize the unseen mouth interior and control the eye-gaze direction of the output, as well as produce the final image from a novel viewpoint. The method is even capable of generating fully-controllable temporally stable video sequences, despite not using temporal information during training. After training, we can use our network to produce dynamic image-based avatars that are controllable on mobile devices in real time. To do this, we compute a fixed set of output images that correspond to key blendshapes, from which we extract textures in UV space. Using a subject's expression blendshapes at run-time, we can linearly blend these key textures together to achieve the desired appearance. Furthermore, we can use the mouth interior and eye textures produced by our network to synthesize on-the-fly avatar animations for those regions. Our work produces state-of-the-art quality image and video synthesis, and is the first to our knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image. |
Databáze: | OpenAIRE |
Externí odkaz: |