Výsledky vyhledávání

Report

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Autor: Liu, Qihao, Zeng, Zhanpeng, He, Ju, Yu, Qihang, Shen, Xiaohui, Chen, Liang-Chieh

This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation.

Externí odkaz: http://arxiv.org/abs/2406.09416

Zobrazit plný text záznamu

Report

An Image is Worth 32 Tokens for Reconstruction and Generation

Autor: Yu, Qihang, Weber, Mark, Deng, Xueqing, Shen, Xiaohui, Cremers, Daniel, Chen, Liang-Chieh

Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands co

Externí odkaz: http://arxiv.org/abs/2406.07550

Zobrazit plný text záznamu

Report

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Autor: Shin, Inkyu, Yu, Qihang, Shen, Xiaohui, Kweon, In So, Yoon, Kuk-Jin, Chen, Liang-Chieh

Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based vid

Externí odkaz: http://arxiv.org/abs/2406.02541

Zobrazit plný text záznamu

Report

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Autor: Yu, Hongyun, Qu, Zhan, Yu, Qihang, Chen, Jianchuan, Jiang, Zhonghua, Chen, Zhiwen, Zhang, Shengyu, Xu, Jimin, Wu, Fei, Lv, Chengfei, Yu, Gang

Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some lim

Externí odkaz: http://arxiv.org/abs/2404.14037

Zobrazit plný text záznamu

Report

COCONut: Modernizing COCO Segmentation

Autor: Deng, Xueqing, Yu, Qihang, Wang, Peng, Shen, Xiaohui, Chen, Liang-Chieh

In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segm

Externí odkaz: http://arxiv.org/abs/2404.08639

Zobrazit plný text záznamu

Report

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Autor: Chen, Jieneng, Yu, Qihang, Shen, Xiaohui, Yuille, Alan, Chen, Liang-Chieh

Recent breakthroughs in vision-language models (VLMs) start a new page in the vision community. The VLMs provide stronger and more generalizable feature embeddings compared to those from ImageNet-pretrained models, thanks to the training on the large

Externí odkaz: http://arxiv.org/abs/2404.02132

Zobrazit plný text záznamu

Report

A Simple Video Segmenter by Tracking Objects Along Axial Trajectories

Autor: He, Ju, Yu, Qihang, Shin, Inkyu, Deng, Xueqing, Yuille, Alan, Shen, Xiaohui, Chen, Liang-Chieh

Video segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to video segmentation with high-resolution input features poses significant challenges

Externí odkaz: http://arxiv.org/abs/2311.18537

Zobrazit plný text záznamu

Report

Towards Open-Ended Visual Recognition with Large Language Model

Autor: Yu, Qihang, Shen, Xiaohui, Chen, Liang-Chieh

Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception. Recent methods have endeavored to address the issue by employing a class-agnostic mask (or box) proposal mode

Externí odkaz: http://arxiv.org/abs/2311.08400

Zobrazit plný text záznamu

Report

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

Autor: Chen, Jieneng, Mei, Jieru, Li, Xianhang, Lu, Yongyi, Yu, Qihang, Wei, Qingyue, Luo, Xiangde, Xie, Yutong, Adeli, Ehsan, Wang, Yan, Lungren, Matthew, Xing, Lei, Lu, Le, Yuille, Alan, Zhou, Yuyin

Medical image segmentation plays a crucial role in advancing healthcare systems for disease diagnosis and treatment planning. The u-shaped architecture, popularly known as U-Net, has proven highly successful for various medical image segmentation tas

Externí odkaz: http://arxiv.org/abs/2310.07781

Zobrazit plný text záznamu

Report

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Autor: Yu, Qihang, He, Ju, Deng, Xueqing, Shen, Xiaohui, Chen, Liang-Chieh

Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a

Externí odkaz: http://arxiv.org/abs/2308.02487

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání