Zobrazeno 1 - 10
of 60
pro vyhledávání: '"Xian, Yongqin"'
Autor:
Fan, Yue, Xian, Yongqin, Zhai, Xiaohua, Kolesnikov, Alexander, Naeem, Muhammad Ferjad, Schiele, Bernt, Tombari, Federico
Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated inspiring
Externí odkaz:
http://arxiv.org/abs/2407.00503
Autor:
Wan, Bo, Tschannen, Michael, Xian, Yongqin, Pavetic, Filip, Alabdulmohsin, Ibrahim, Wang, Xiao, Pinto, André Susano, Steiner, Andreas, Beyer, Lucas, Zhai, Xiaohua
Image captioning has been shown as an effective pretraining method similar to contrastive pretraining. However, the incorporation of location-aware information into visual pretraining remains an area with limited research. In this paper, we propose a
Externí odkaz:
http://arxiv.org/abs/2403.19596
In this paper we present a text-conditioned video resampler (TCR) module that uses a pre-trained and frozen visual encoder and large language model (LLM) to process long video sequences for a task. TCR localises relevant visual features from the vide
Externí odkaz:
http://arxiv.org/abs/2312.11897
Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant cha
Externí odkaz:
http://arxiv.org/abs/2312.09256
Understanding human activity is a crucial yet intricate task in egocentric vision, a field that focuses on capturing visual perspectives from the camera wearer's viewpoint. Traditional methods heavily rely on representation learning that is trained o
Externí odkaz:
http://arxiv.org/abs/2311.17944
Autor:
Naeem, Muhammad Ferjad, Xian, Yongqin, Zhai, Xiaohua, Hoyer, Lukas, Van Gool, Luc, Tombari, Federico
Image-Text pretraining on web-scale image caption datasets has become the default recipe for open vocabulary classification and retrieval models thanks to the success of CLIP and its variants. Several works have also used CLIP features for dense pred
Externí odkaz:
http://arxiv.org/abs/2310.13355
Autor:
Wang, Qian, Xian, Yongqin, Ling, Hefei, Zhang, Jinyuan, Lin, Xiaorui, Li, Ping, Chen, Jiazhong, Yu, Ning
Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniq
Externí odkaz:
http://arxiv.org/abs/2304.11359
The problem of long-tailed recognition (LTR) has received attention in recent years due to the fundamental power-law distribution of objects in the real-world. Most recent works in LTR use softmax classifiers that are biased in that they correlate cl
Externí odkaz:
http://arxiv.org/abs/2302.00491
For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training sem
Externí odkaz:
http://arxiv.org/abs/2212.07911
Autor:
Cao, Jiezhang, Wang, Qin, Xian, Yongqin, Li, Yawei, Ni, Bingbing, Pi, Zhiming, Zhang, Kai, Zhang, Yulun, Timofte, Radu, Van Gool, Luc
Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble n
Externí odkaz:
http://arxiv.org/abs/2212.04362