Zobrazeno 1 - 10
of 168
pro vyhledávání: '"Chen, Liang-Chieh"'
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation.
Externí odkaz:
http://arxiv.org/abs/2406.09416
Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands co
Externí odkaz:
http://arxiv.org/abs/2406.07550
Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based vid
Externí odkaz:
http://arxiv.org/abs/2406.02541
In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segm
Externí odkaz:
http://arxiv.org/abs/2404.08639
Recent breakthroughs in vision-language models (VLMs) start a new page in the vision community. The VLMs provide stronger and more generalizable feature embeddings compared to those from ImageNet-pretrained models, thanks to the training on the large
Externí odkaz:
http://arxiv.org/abs/2404.02132
In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt
Externí odkaz:
http://arxiv.org/abs/2401.02931
Autor:
Rashwan, Abdullah, Zhang, Jiageng, Taalimi, Ali, Yang, Fan, Zhou, Xingyi, Yan, Chaochao, Chen, Liang-Chieh, Li, Yeqing
In recent years, transformer-based models have dominated panoptic segmentation, thanks to their strong modeling capabilities and their unified representation for both semantic and instance classes as global binary masks. In this paper, we revisit pur
Externí odkaz:
http://arxiv.org/abs/2312.06052
Autor:
He, Ju, Yu, Qihang, Shin, Inkyu, Deng, Xueqing, Yuille, Alan, Shen, Xiaohui, Chen, Liang-Chieh
Video segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to video segmentation with high-resolution input features poses significant challenges
Externí odkaz:
http://arxiv.org/abs/2311.18537
Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception. Recent methods have endeavored to address the issue by employing a class-agnostic mask (or box) proposal mode
Externí odkaz:
http://arxiv.org/abs/2311.08400
Autor:
Yang, Xuan, Yuan, Liangzhe, Wilber, Kimberly, Sharma, Astuti, Gu, Xiuye, Qiao, Siyuan, Debats, Stephanie, Wang, Huisheng, Adam, Hartwig, Sirotenko, Mikhail, Chen, Liang-Chieh
Dense prediction tasks, such as semantic segmentation, depth estimation, and surface normal prediction, can be easily formulated as per-pixel classification (discrete outputs) or regression (continuous outputs). This per-pixel prediction paradigm has
Externí odkaz:
http://arxiv.org/abs/2311.05770