Zobrazeno 1 - 10
of 272
pro vyhledávání: '"Mao, Jiawei"'
This paper proposes the first pure Transformer structure inversion network called SwinStyleformer, which can compensate for the shortcomings of the CNNs inversion framework by handling long-range dependencies and learning the global structure of obje
Externí odkaz:
http://arxiv.org/abs/2406.13153
There are many excellent solutions in image restoration.However, most methods require on training separate models to restore images with different types of degradation.Although existing all-in-one models effectively address multiple types of degradat
Externí odkaz:
http://arxiv.org/abs/2406.12587
Masked autoencoders (MAEs) have displayed significant potential in the classification and semantic segmentation of medical images in the last year. Due to the high similarity of human tissues, even slight changes in medical images may represent disea
Externí odkaz:
http://arxiv.org/abs/2305.05871
Compared to other severe weather image restoration tasks, single image desnowing is a more challenging task. This is mainly due to the diversity and irregularity of snow shape, which makes it extremely difficult to restore images in snowy scenes. Mor
Externí odkaz:
http://arxiv.org/abs/2303.09988
Facial expression recognition (FER) plays an important role in a variety of real-world applications such as human-computer interaction. POSTER achieves the state-of-the-art (SOTA) performance in FER by effectively combining facial landmark and image
Externí odkaz:
http://arxiv.org/abs/2301.12149
Vision Transformers (ViTs) outperforms convolutional neural networks (CNNs) in several vision tasks with its global modeling capabilities. However, ViT lacks the inductive bias inherent to convolution making it require a large amount of data for trai
Externí odkaz:
http://arxiv.org/abs/2212.05677
Facial expression recognition (FER) plays a significant role in the ubiquitous application of computer vision. We revisit this problem with a new perspective on whether it can acquire useful representations that improve FER performance in the image g
Externí odkaz:
http://arxiv.org/abs/2211.13564
Some self-supervised cross-modal learning approaches have recently demonstrated the potential of image signals for enhancing point cloud representation. However, it remains a question on how to directly model cross-modal local and global corresponden
Externí odkaz:
http://arxiv.org/abs/2211.12032
Compared with the vanilla transformer, the window-based transformer offers a better trade-off between accuracy and efficiency. Although the window-based transformer has made great progress, its long-range modeling capabilities are limited due to the
Externí odkaz:
http://arxiv.org/abs/2211.06083
This paper explores improvements to the masked image modeling (MIM) paradigm. The MIM paradigm enables the model to learn the main object features of the image by masking the input image and predicting the masked part by the unmasked part. We found t
Externí odkaz:
http://arxiv.org/abs/2205.10546