Latent diffusion transformer for point cloud generation.

Autor: Ji, Junzhong, Zhao, Runfeng, Lei, Minglong
Předmět:
Zdroj: Visual Computer; Jun2024, Vol. 40 Issue 6, p3903-3917, 15p
Abstrakt: Diffusion models have been successfully applied to point cloud generation tasks recently. The main notion is using a forward process to progressively add noises into point clouds and then use a reverse process to generate point clouds by denoising these noises. However, since point cloud data is high-dimensional and exhibits complex structures, it is challenging to adequately capture the surface distribution of point clouds. Moreover, point cloud generation methods often resort to sampling methods and local operations to extract features, which inevitably ignores the global structures and overall shapes of point clouds. To address these limitations, we propose a latent diffusion model based on Transformers for point cloud generation. Instead of directly building a diffusion process based on the points, we first propose a latent compressor to convert original point clouds into a set of latent tokens before feeding them into diffusion models. Converting point clouds as latent tokens not only improves expressiveness, but also exhibits better flexibility since they can adapt to various downstream tasks. We carefully design the latent compressor based on an attention-based auto-encoder architecture to capture global structures in point clouds. Then, we propose to use transformers as the backbones of the latent diffusion module to maintain global structures. The powerful feature extraction ability of transformers guarantees the high quality and smoothness of generated point clouds. Experiments show that our method achieves superior performance in both unconditional generation on ShapeNet and multi-modal point cloud completion on ShapeNet-ViPC. Our code and samples are publicly available at https://github.com/Negai-98/LDT. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index