Zobrazeno 1 - 10
of 26
pro vyhledávání: '"Chang, Yuanqi"'
Story visualization, the task of generating coherent images based on a narrative, has seen significant advancements with the emergence of text-to-image models, particularly diffusion models. However, maintaining semantic consistency, generating high-
Externí odkaz:
http://arxiv.org/abs/2410.06244
This paper proposes the first pure Transformer structure inversion network called SwinStyleformer, which can compensate for the shortcomings of the CNNs inversion framework by handling long-range dependencies and learning the global structure of obje
Externí odkaz:
http://arxiv.org/abs/2406.13153
There are many excellent solutions in image restoration.However, most methods require on training separate models to restore images with different types of degradation.Although existing all-in-one models effectively address multiple types of degradat
Externí odkaz:
http://arxiv.org/abs/2406.12587
Masked autoencoders (MAEs) have displayed significant potential in the classification and semantic segmentation of medical images in the last year. Due to the high similarity of human tissues, even slight changes in medical images may represent disea
Externí odkaz:
http://arxiv.org/abs/2305.05871
Compared to other severe weather image restoration tasks, single image desnowing is a more challenging task. This is mainly due to the diversity and irregularity of snow shape, which makes it extremely difficult to restore images in snowy scenes. Mor
Externí odkaz:
http://arxiv.org/abs/2303.09988
Facial expression recognition (FER) plays an important role in a variety of real-world applications such as human-computer interaction. POSTER achieves the state-of-the-art (SOTA) performance in FER by effectively combining facial landmark and image
Externí odkaz:
http://arxiv.org/abs/2301.12149
Facial expression recognition (FER) plays a significant role in the ubiquitous application of computer vision. We revisit this problem with a new perspective on whether it can acquire useful representations that improve FER performance in the image g
Externí odkaz:
http://arxiv.org/abs/2211.13564
Compared with the vanilla transformer, the window-based transformer offers a better trade-off between accuracy and efficiency. Although the window-based transformer has made great progress, its long-range modeling capabilities are limited due to the
Externí odkaz:
http://arxiv.org/abs/2211.06083
This paper explores improvements to the masked image modeling (MIM) paradigm. The MIM paradigm enables the model to learn the main object features of the image by masking the input image and predicting the masked part by the unmasked part. We found t
Externí odkaz:
http://arxiv.org/abs/2205.10546
Publikováno v:
In Pattern Recognition January 2025 157