Zobrazeno 1 - 10
of 116
pro vyhledávání: '"Chao, Hongyang"'
Autor:
He, Huiguo, Yang, Huan, Tuo, Zixi, Zhou, Yuan, Wang, Qiuyue, Zhang, Yuhang, Liu, Zeyu, Huang, Wenhao, Chao, Hongyang, Yin, Jian
Story visualization aims to create visually compelling images or videos corresponding to textual narratives. Despite recent advances in diffusion models yielding promising results, existing methods still struggle to create a coherent sequence of subj
Externí odkaz:
http://arxiv.org/abs/2407.12899
Autor:
Wu, Kan, Peng, Houwen, Zhou, Zhenghong, Xiao, Bin, Liu, Mengchen, Yuan, Lu, Xuan, Hong, Valenzuela, Michael, Xi, Chen, Wang, Xinggang, Chao, Hongyang, Hu, Han
In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models. The method introduces two core techniques: affinity mimicking and weight inheritance. Affinity mimicking explores t
Externí odkaz:
http://arxiv.org/abs/2309.12314
Autor:
He, Huiguo, Wang, Tianfu, Yang, Huan, Fu, Jianlong, Yuan, Nicholas Jing, Yin, Jian, Chao, Hongyang, Zhang, Qi
We study the task of generating profitable Non-Fungible Token (NFT) images from user-input texts. Recent advances in diffusion models have shown great potential for image generation. However, existing works can fall short in generating visually-pleas
Externí odkaz:
http://arxiv.org/abs/2306.11731
Recent advances on text-to-image generation have witnessed the rise of diffusion models which act as powerful generative models. Nevertheless, it is not trivial to exploit such latent variable models to capture the dependency among discrete words and
Externí odkaz:
http://arxiv.org/abs/2212.03099
Outlier detection tasks have been playing a critical role in AI safety. There has been a great challenge to deal with this task. Observations show that deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inp
Externí odkaz:
http://arxiv.org/abs/2209.12807
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
BERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks. Existing solutions dominantly capitalize on the multi-modal inputs with mask t
Externí odkaz:
http://arxiv.org/abs/2112.07515
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision. Nevertheless, owing to the extremely varied aspect ratios and scales of text instances in real scenes, most conventional text detectors suffer from
Externí odkaz:
http://arxiv.org/abs/2112.07513
Autor:
Chen, Minghao, Wu, Kan, Ni, Bolin, Peng, Houwen, Liu, Bei, Fu, Jianlong, Chao, Hongyang, Ling, Haibin
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures. In this paper, we propose
Externí odkaz:
http://arxiv.org/abs/2111.14725
We present a new perspective of achieving image synthesis by viewing this task as a visual token generation problem. Different from existing paradigms that directly synthesize a full image from a single input (e.g., a latent code), the new formulatio
Externí odkaz:
http://arxiv.org/abs/2111.03481
Publikováno v:
IEEE Transactions on Image Processing, vol. 30, pp. 6637-6647, 2021
The defect detection task can be regarded as a realistic scenario of object detection in the computer vision field and it is widely used in the industrial field. Directly applying vanilla object detector to defect detection task can achieve promising
Externí odkaz:
http://arxiv.org/abs/2108.04456