Výsledky vyhledávání - "Chen, Liang-Chieh"

Report

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Autor: Liu, Qihao, Zeng, Zhanpeng, He, Ju, Yu, Qihang, Shen, Xiaohui, Chen, Liang-Chieh

This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation.

Externí odkaz: http://arxiv.org/abs/2406.09416

Zobrazit plný text záznamu

Report

An Image is Worth 32 Tokens for Reconstruction and Generation

Autor: Yu, Qihang, Weber, Mark, Deng, Xueqing, Shen, Xiaohui, Cremers, Daniel, Chen, Liang-Chieh

Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands co

Externí odkaz: http://arxiv.org/abs/2406.07550

Zobrazit plný text záznamu

Report

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Autor: Shin, Inkyu, Yu, Qihang, Shen, Xiaohui, Kweon, In So, Yoon, Kuk-Jin, Chen, Liang-Chieh

Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based vid

Externí odkaz: http://arxiv.org/abs/2406.02541

Zobrazit plný text záznamu

Report

COCONut: Modernizing COCO Segmentation

Autor: Deng, Xueqing, Yu, Qihang, Wang, Peng, Shen, Xiaohui, Chen, Liang-Chieh

In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segm

Externí odkaz: http://arxiv.org/abs/2404.08639

Zobrazit plný text záznamu

Report

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Autor: Chen, Jieneng, Yu, Qihang, Shen, Xiaohui, Yuille, Alan, Chen, Liang-Chieh

Recent breakthroughs in vision-language models (VLMs) start a new page in the vision community. The VLMs provide stronger and more generalizable feature embeddings compared to those from ImageNet-pretrained models, thanks to the training on the large

Externí odkaz: http://arxiv.org/abs/2404.02132

Zobrazit plný text záznamu

Report

SPFormer: Enhancing Vision Transformer with Superpixel Representation

Autor: Mei, Jieru, Chen, Liang-Chieh, Yuille, Alan, Xie, Cihang

In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt

Externí odkaz: http://arxiv.org/abs/2401.02931

Zobrazit plný text záznamu

Report

MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation

Autor: Rashwan, Abdullah, Zhang, Jiageng, Taalimi, Ali, Yang, Fan, Zhou, Xingyi, Yan, Chaochao, Chen, Liang-Chieh, Li, Yeqing

In recent years, transformer-based models have dominated panoptic segmentation, thanks to their strong modeling capabilities and their unified representation for both semantic and instance classes as global binary masks. In this paper, we revisit pur

Externí odkaz: http://arxiv.org/abs/2312.06052

Zobrazit plný text záznamu

Report

A Simple Video Segmenter by Tracking Objects Along Axial Trajectories

Autor: He, Ju, Yu, Qihang, Shin, Inkyu, Deng, Xueqing, Yuille, Alan, Shen, Xiaohui, Chen, Liang-Chieh

Video segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to video segmentation with high-resolution input features poses significant challenges

Externí odkaz: http://arxiv.org/abs/2311.18537

Zobrazit plný text záznamu

Report

Towards Open-Ended Visual Recognition with Large Language Model

Autor: Yu, Qihang, Shen, Xiaohui, Chen, Liang-Chieh

Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception. Recent methods have endeavored to address the issue by employing a class-agnostic mask (or box) proposal mode

Externí odkaz: http://arxiv.org/abs/2311.08400

Zobrazit plný text záznamu

Report

PolyMaX: General Dense Prediction with Mask Transformer

Autor: Yang, Xuan, Yuan, Liangzhe, Wilber, Kimberly, Sharma, Astuti, Gu, Xiuye, Qiao, Siyuan, Debats, Stephanie, Wang, Huisheng, Adam, Hartwig, Sirotenko, Mikhail, Chen, Liang-Chieh

Dense prediction tasks, such as semantic segmentation, depth estimation, and surface normal prediction, can be easily formulated as per-pixel classification (discrete outputs) or regression (continuous outputs). This per-pixel prediction paradigm has

Externí odkaz: http://arxiv.org/abs/2311.05770

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání