Výsledky vyhledávání - "Xian, Yongqin"

Report

Toward a Diffusion-Based Generalist for Dense Vision Tasks

Autor: Fan, Yue, Xian, Yongqin, Zhai, Xiaohua, Kolesnikov, Alexander, Naeem, Muhammad Ferjad, Schiele, Bernt, Tombari, Federico

Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated inspiring

Externí odkaz: http://arxiv.org/abs/2407.00503

Zobrazit plný text záznamu

Report

LocCa: Visual Pretraining with Location-aware Captioners

Autor: Wan, Bo, Tschannen, Michael, Xian, Yongqin, Pavetic, Filip, Alabdulmohsin, Ibrahim, Wang, Xiao, Pinto, André Susano, Steiner, Andreas, Beyer, Lucas, Zhai, Xiaohua

Image captioning has been shown as an effective pretraining method similar to contrastive pretraining. However, the incorporation of location-aware information into visual pretraining remains an area with limited research. In this paper, we propose a

Externí odkaz: http://arxiv.org/abs/2403.19596

Zobrazit plný text záznamu

Report

Text-Conditioned Resampler For Long Form Video Understanding

Autor: Korbar, Bruno, Xian, Yongqin, Tonioni, Alessio, Zisserman, Andrew, Tombari, Federico

In this paper we present a text-conditioned video resampler (TCR) module that uses a pre-trained and frozen visual encoder and large language model (LLM) to process long video sequences for a task. TCR localises relevant visual features from the vide

Externí odkaz: http://arxiv.org/abs/2312.11897

Zobrazit plný text záznamu

Report

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

Autor: Simsar, Enis, Tonioni, Alessio, Xian, Yongqin, Hofmann, Thomas, Tombari, Federico

Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant cha

Externí odkaz: http://arxiv.org/abs/2312.09256

Zobrazit plný text záznamu

Report

PALM: Predicting Actions through Language Models

Autor: Kim, Sanghwan, Huang, Daoji, Xian, Yongqin, Hilliges, Otmar, Van Gool, Luc, Wang, Xi

Understanding human activity is a crucial yet intricate task in egocentric vision, a field that focuses on capturing visual perspectives from the camera wearer's viewpoint. Traditional methods heavily rely on representation learning that is trained o

Externí odkaz: http://arxiv.org/abs/2311.17944

Zobrazit plný text záznamu

Report

SILC: Improving Vision Language Pretraining with Self-Distillation

Autor: Naeem, Muhammad Ferjad, Xian, Yongqin, Zhai, Xiaohua, Hoyer, Lukas, Van Gool, Luc, Tombari, Federico

Image-Text pretraining on web-scale image caption datasets has become the default recipe for open vocabulary classification and retrieval models thanks to the success of CLIP and its variants. Several works have also used CLIP features for dense pred

Externí odkaz: http://arxiv.org/abs/2310.13355

Zobrazit plný text záznamu

Report

Detecting Adversarial Faces Using Only Real Face Self-Perturbations

Autor: Wang, Qian, Xian, Yongqin, Ling, Hefei, Zhang, Jinyuan, Lin, Xiaorui, Li, Ping, Chen, Jiazhong, Yu, Ning

Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniq

Externí odkaz: http://arxiv.org/abs/2304.11359

Zobrazit plný text záznamu

Report

Learning Prototype Classifiers for Long-Tailed Recognition

Autor: Sharma, Saurabh, Xian, Yongqin, Yu, Ning, Singh, Ambuj

The problem of long-tailed recognition (LTR) has received attention in recent years due to the fundamental power-law distribution of objects in the real-world. Most recent works in LTR use softmax classifiers that are biased in that they correlate cl

Externí odkaz: http://arxiv.org/abs/2302.00491

Zobrazit plný text záznamu

Report

Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Autor: Das, Anurag, Xian, Yongqin, He, Yang, Akata, Zeynep, Schiele, Bernt

For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training sem

Externí odkaz: http://arxiv.org/abs/2212.07911

Zobrazit plný text záznamu

Report

CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

Autor: Cao, Jiezhang, Wang, Qin, Xian, Yongqin, Li, Yawei, Ni, Bingbing, Pi, Zhiming, Zhang, Kai, Zhang, Yulun, Timofte, Radu, Van Gool, Luc

Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble n

Externí odkaz: http://arxiv.org/abs/2212.04362

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání