Zobrazeno 1 - 10
of 40
pro vyhledávání: '"Fang, Shancheng"'
Automatic live video commenting is with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from the current methods. Sentimen
Externí odkaz:
http://arxiv.org/abs/2404.12782
Autor:
Qi, Tianhao, Fang, Shancheng, Wu, Yanze, Xie, Hongtao, Liu, Jiawei, Chen, Lang, He, Qian, Zhang, Yongdong
The diffusion-based text-to-image model harbors immense potential in transferring reference style. However, current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. In this pape
Externí odkaz:
http://arxiv.org/abs/2403.06951
Autor:
Chen, Zhuowei, Fang, Shancheng, Liu, Wei, He, Qian, Huang, Mengqi, Zhang, Yongdong, Mao, Zhendong
While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images. Existing methods either require time-consuming o
Externí odkaz:
http://arxiv.org/abs/2307.00300
Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation
Diffusion models are able to generate photorealistic images in arbitrary scenes. However, when applying diffusion models to image translation, there exists a trade-off between maintaining spatial structure and high-quality content. Besides, existing
Externí odkaz:
http://arxiv.org/abs/2302.02284
Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However,
Externí odkaz:
http://arxiv.org/abs/2211.10578
The Transformer-based encoder-decoder framework is becoming popular in scene text recognition, largely because it naturally integrates recognition clues from both visual and semantic domains. However, recent studies show that the two kinds of clues a
Externí odkaz:
http://arxiv.org/abs/2111.11011
In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we
Externí odkaz:
http://arxiv.org/abs/2108.09661
Scene text removal (STR) contains two processes: text localization and background reconstruction. Through integrating both processes into a single network, previous methods provide an implicit erasure guidance by modifying all pixels in the entire im
Externí odkaz:
http://arxiv.org/abs/2106.13029
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models come
Externí odkaz:
http://arxiv.org/abs/2103.06495
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.