Zobrazeno 1 - 10
of 111
pro vyhledávání: '"Yu, Qihang"'
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation.
Externí odkaz:
http://arxiv.org/abs/2406.09416
Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands co
Externí odkaz:
http://arxiv.org/abs/2406.07550
Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based vid
Externí odkaz:
http://arxiv.org/abs/2406.02541
Autor:
Yu, Hongyun, Qu, Zhan, Yu, Qihang, Chen, Jianchuan, Jiang, Zhonghua, Chen, Zhiwen, Zhang, Shengyu, Xu, Jimin, Wu, Fei, Lv, Chengfei, Yu, Gang
Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some lim
Externí odkaz:
http://arxiv.org/abs/2404.14037
In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segm
Externí odkaz:
http://arxiv.org/abs/2404.08639
Recent breakthroughs in vision-language models (VLMs) start a new page in the vision community. The VLMs provide stronger and more generalizable feature embeddings compared to those from ImageNet-pretrained models, thanks to the training on the large
Externí odkaz:
http://arxiv.org/abs/2404.02132
Autor:
He, Ju, Yu, Qihang, Shin, Inkyu, Deng, Xueqing, Yuille, Alan, Shen, Xiaohui, Chen, Liang-Chieh
Video segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to video segmentation with high-resolution input features poses significant challenges
Externí odkaz:
http://arxiv.org/abs/2311.18537
Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception. Recent methods have endeavored to address the issue by employing a class-agnostic mask (or box) proposal mode
Externí odkaz:
http://arxiv.org/abs/2311.08400
Autor:
Chen, Jieneng, Mei, Jieru, Li, Xianhang, Lu, Yongyi, Yu, Qihang, Wei, Qingyue, Luo, Xiangde, Xie, Yutong, Adeli, Ehsan, Wang, Yan, Lungren, Matthew, Xing, Lei, Lu, Le, Yuille, Alan, Zhou, Yuyin
Medical image segmentation plays a crucial role in advancing healthcare systems for disease diagnosis and treatment planning. The u-shaped architecture, popularly known as U-Net, has proven highly successful for various medical image segmentation tas
Externí odkaz:
http://arxiv.org/abs/2310.07781
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a
Externí odkaz:
http://arxiv.org/abs/2308.02487