Zobrazeno 1 - 10
of 81 155
pro vyhledávání: '"high‐resolution image"'
Denoising with a Joint-Embedding Predictive Architecture (D-JEPA), an autoregressive model, has demonstrated outstanding performance in class-conditional image generation. However, the application of next-token prediction in high-resolution text-to-i
Externí odkaz:
http://arxiv.org/abs/2411.14808
Autor:
Han, Jian, Liu, Jinlai, Jiang, Yi, Yan, Bin, Zhang, Yuqi, Yuan, Zehuan, Peng, Bingyue, Liu, Xiaobing
We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution, photorealistic images following language instruction. Infinity redefines visual autoregressive model under a bitwise token prediction framework with
Externí odkaz:
http://arxiv.org/abs/2412.04431
Autor:
Yang, Haosen, Bulat, Adrian, Hadji, Isma, Pham, Hai X., Zhu, Xiatian, Tzimiropoulos, Georgios, Martinez, Brais
Diffusion models are proficient at generating high-quality images. They are however effective only when operating at the resolution used during training. Inference at a scaled resolution leads to repetitive patterns and structural distortions. Retrai
Externí odkaz:
http://arxiv.org/abs/2411.18552
Autor:
Xie, Enze, Chen, Junsong, Chen, Junyu, Cai, Han, Tang, Haotian, Lin, Yujun, Zhang, Zhekai, Li, Muyang, Zhu, Ligeng, Lu, Yao, Han, Song
We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096$\times$4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on l
Externí odkaz:
http://arxiv.org/abs/2410.10629
Latent diffusion models (LDMs), such as Stable Diffusion, often experience significant structural distortions when directly generating high-resolution (HR) images that exceed their original training resolutions. A straightforward and cost-effective s
Externí odkaz:
http://arxiv.org/abs/2410.06055
Implicit representation mapping (IRM) can translate image features to any continuous resolution, showcasing its potent capability for ultra-high-resolution image segmentation refinement. Current IRM-based methods for refining ultra-high-resolution im
Externí odkaz:
http://arxiv.org/abs/2407.21256
Autor:
Reidy, Brendan, Tabrizchi, Sepehr, Mohammadi, Mohamadreza, Angizi, Shaahin, Roohi, Arman, Zand, Ramtin
With the rise of tiny IoT devices powered by machine learning (ML), many researchers have directed their focus toward compressing models to fit on tiny edge devices. Recent works have achieved remarkable success in compressing ML models for object de
Externí odkaz:
http://arxiv.org/abs/2408.03956
Autor:
Ren, Jingjing, Li, Wenbo, Chen, Haoyu, Pei, Renjing, Shao, Bin, Guo, Yong, Peng, Long, Song, Fenglong, Zhu, Lei
Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture utilizing ca
Externí odkaz:
http://arxiv.org/abs/2407.02158
This paper presents a novel hybrid quantum generative model, the VAE-QWGAN, which combines the strengths of a classical Variational AutoEncoder (VAE) with a hybrid Quantum Wasserstein Generative Adversarial Network (QWGAN). The VAE-QWGAN integrates t
Externí odkaz:
http://arxiv.org/abs/2409.10339
Multimodal large language models (MLLMs) have experienced significant advancements recently, but still struggle to recognize and interpret intricate details in high-resolution (HR) images effectively. While state-of-the-art (SOTA) MLLMs claim to proc
Externí odkaz:
http://arxiv.org/abs/2408.15556