Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Jia, Shaochong"'
Diffusion models have been widely used for conditional data cross-modal generation tasks such as text-to-image and text-to-video. However, state-of-the-art models still fail to align the generated visual concepts with high-level semantics in a langua
Externí odkaz:
http://arxiv.org/abs/2403.16530