Zobrazeno 1 - 10
of 19
pro vyhledávání: '"Wu, Xiaoshi"'
Autor:
Wu, Xiaoshi, Hao, Yiming, Zhang, Manyuan, Sun, Keqiang, Huang, Zhaoyang, Song, Guanglu, Liu, Yu, Li, Hongsheng
Optimizing a text-to-image diffusion model with a given reward function is an important but underexplored research area. In this study, we propose Deep Reward Tuning (DRTune), an algorithm that directly supervises the final output image of a text-to-
Externí odkaz:
http://arxiv.org/abs/2405.00760
Autor:
Jiang, Dongzhi, Song, Guanglu, Wu, Xiaoshi, Zhang, Renrui, Shen, Dazhong, Zong, Zhuofan, Liu, Yu, Li, Hongsheng
Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensivel
Externí odkaz:
http://arxiv.org/abs/2404.03653
Autor:
Li, Sicheng, Sun, Keqiang, Lai, Zhixin, Wu, Xiaoshi, Qiu, Feng, Xie, Haoran, Miyata, Kazunori, Li, Hongsheng
The conditional text-to-image diffusion models have garnered significant attention in recent years. However, the precision of these models is often compromised mainly for two reasons, ambiguous condition input and inadequate condition guidance over s
Externí odkaz:
http://arxiv.org/abs/2403.18417
Autor:
Wang, Fu-Yun, Wu, Xiaoshi, Huang, Zhaoyang, Shi, Xiaoyu, Shen, Dazhong, Song, Guanglu, Liu, Yu, Li, Hongsheng
Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video while maintaining inter-frame and intra-frame consistency. Existing methods fall short in either generation quality or flexibility. We
Externí odkaz:
http://arxiv.org/abs/2403.13745
Autor:
Sun, Keqiang, Pan, Junting, Ge, Yuying, Li, Hao, Duan, Haodong, Wu, Xiaoshi, Zhang, Renrui, Zhou, Aojun, Qin, Zipeng, Wang, Yi, Dai, Jifeng, Qiao, Yu, Wang, Limin, Li, Hongsheng
While recent advancements in vision-language models have had a transformative impact on multi-modal comprehension, the extent to which these models possess the ability to comprehend generated images remains uncertain. Synthetic images, in comparison
Externí odkaz:
http://arxiv.org/abs/2307.00716
Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference D
Externí odkaz:
http://arxiv.org/abs/2306.09341
Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences, such as awkwa
Externí odkaz:
http://arxiv.org/abs/2303.14420
Open-vocabulary detection (OVD) is an object detection task aiming at detecting objects from novel categories beyond the base categories on which the detector is trained. Recent OVD methods rely on large-scale visual-language pre-trained models, such
Externí odkaz:
http://arxiv.org/abs/2303.13076
Autor:
Zhu, Xizhou, Zhu, Jinguo, Li, Hao, Wu, Xiaoshi, Wang, Xiaogang, Li, Hongsheng, Wang, Xiaohua, Dai, Jifeng
Biological intelligence systems of animals perceive the world by integrating information in different modalities and processing simultaneously for various tasks. In contrast, current machine learning research follows a task-specific paradigm, leading
Externí odkaz:
http://arxiv.org/abs/2112.01522
The abundance and richness of Internet photos of landmarks and cities has led to significant progress in 3D vision over the past two decades, including automated 3D reconstructions of the world's landmarks from tourist photos. However, a major source
Externí odkaz:
http://arxiv.org/abs/2108.05863