Výsledky vyhledávání - "Jia, Shaochong"

Report

An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models

Autor: Hu, Zizhao, Jia, Shaochong, Rostami, Mohammad

Diffusion models have been widely used for conditional data cross-modal generation tasks such as text-to-image and text-to-video. However, state-of-the-art models still fail to align the generated visual concepts with high-level semantics in a langua

Externí odkaz: http://arxiv.org/abs/2403.16530

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání