Consistent123: Improve Consistency for One Image to 3D Object Synthesis

Autor:	Weng, Haohan, Yang, Tianyu, Wang, Jianan, Li, Yu, Zhang, Tong, Chen, C. L. Philip, Zhang, Lei
Rok vydání:	2023
Předmět:	Computer Science - Computer Vision and Pattern Recognition
Druh dokumentu:	Working Paper
Popis:	Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability. However, such models based on image-to-image translation have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation. To empower consistency, we propose Consistent123 to synthesize novel views simultaneously by incorporating additional cross-view attention layers and the shared self-attention mechanism. The proposed attention mechanism improves the interaction across all synthesized views, as well as the alignment between the condition view and novel views. In the sampling stage, such architecture supports simultaneously generating an arbitrary number of views while training at a fixed length. We also introduce a progressive classifier-free guidance strategy to achieve the trade-off between texture and geometry for synthesized object views. Qualitative and quantitative experiments show that Consistent123 outperforms baselines in view consistency by a large margin. Furthermore, we demonstrate a significant improvement of Consistent123 on varying downstream tasks, showing its great potential in the 3D generation field. The project page is available at consistent-123.github.io. Comment: For more qualitative results, please see https://consistent-123.github.io/
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2310.08092 Zobrazit plný text záznamu View this record from Arxiv