Výsledky vyhledávání

Report

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Autor: Wu, Xiaoshi, Hao, Yiming, Zhang, Manyuan, Sun, Keqiang, Huang, Zhaoyang, Song, Guanglu, Liu, Yu, Li, Hongsheng

Optimizing a text-to-image diffusion model with a given reward function is an important but underexplored research area. In this study, we propose Deep Reward Tuning (DRTune), an algorithm that directly supervises the final output image of a text-to-

Externí odkaz: http://arxiv.org/abs/2405.00760

Zobrazit plný text záznamu

Report

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Autor: Jiang, Dongzhi, Song, Guanglu, Wu, Xiaoshi, Zhang, Renrui, Shen, Dazhong, Zong, Zhuofan, Liu, Yu, Li, Hongsheng

Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensivel

Externí odkaz: http://arxiv.org/abs/2404.03653

Zobrazit plný text záznamu

Report

ECNet: Effective Controllable Text-to-Image Diffusion Models

Autor: Li, Sicheng, Sun, Keqiang, Lai, Zhixin, Wu, Xiaoshi, Qiu, Feng, Xie, Haoran, Miyata, Kazunori, Li, Hongsheng

The conditional text-to-image diffusion models have garnered significant attention in recent years. However, the precision of these models is often compromised mainly for two reasons, ambiguous condition input and inadequate condition guidance over s

Externí odkaz: http://arxiv.org/abs/2403.18417

Zobrazit plný text záznamu

Report

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Autor: Wang, Fu-Yun, Wu, Xiaoshi, Huang, Zhaoyang, Shi, Xiaoyu, Shen, Dazhong, Song, Guanglu, Liu, Yu, Li, Hongsheng

Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video while maintaining inter-frame and intra-frame consistency. Existing methods fall short in either generation quality or flexibility. We

Externí odkaz: http://arxiv.org/abs/2403.13745

Zobrazit plný text záznamu

Report

JourneyDB: A Benchmark for Generative Image Understanding

Autor: Sun, Keqiang, Pan, Junting, Ge, Yuying, Li, Hao, Duan, Haodong, Wu, Xiaoshi, Zhang, Renrui, Zhou, Aojun, Qin, Zipeng, Wang, Yi, Dai, Jifeng, Qiao, Yu, Wang, Limin, Li, Hongsheng

While recent advancements in vision-language models have had a transformative impact on multi-modal comprehension, the extent to which these models possess the ability to comprehend generated images remains uncertain. Synthetic images, in comparison

Externí odkaz: http://arxiv.org/abs/2307.00716

Zobrazit plný text záznamu

Report

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Autor: Wu, Xiaoshi, Hao, Yiming, Sun, Keqiang, Chen, Yixiong, Zhu, Feng, Zhao, Rui, Li, Hongsheng

Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference D

Externí odkaz: http://arxiv.org/abs/2306.09341

Zobrazit plný text záznamu

Report

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

Autor: Wu, Xiaoshi, Sun, Keqiang, Zhu, Feng, Zhao, Rui, Li, Hongsheng

Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences, such as awkwa

Externí odkaz: http://arxiv.org/abs/2303.14420

Zobrazit plný text záznamu

Report

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Autor: Wu, Xiaoshi, Zhu, Feng, Zhao, Rui, Li, Hongsheng

Open-vocabulary detection (OVD) is an object detection task aiming at detecting objects from novel categories beyond the base categories on which the detector is trained. Recent OVD methods rely on large-scale visual-language pre-trained models, such

Externí odkaz: http://arxiv.org/abs/2303.13076

Zobrazit plný text záznamu

Report

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Autor: Zhu, Xizhou, Zhu, Jinguo, Li, Hao, Wu, Xiaoshi, Wang, Xiaogang, Li, Hongsheng, Wang, Xiaohua, Dai, Jifeng

Biological intelligence systems of animals perceive the world by integrating information in different modalities and processing simultaneously for various tasks. In contrast, current machine learning research follows a task-specific paradigm, leading

Externí odkaz: http://arxiv.org/abs/2112.01522

Zobrazit plný text záznamu

Report

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision

Autor: Wu, Xiaoshi, Averbuch-Elor, Hadar, Sun, Jin, Snavely, Noah

The abundance and richness of Internet photos of landmarks and cities has led to significant progress in 3D vision over the past two decades, including automated 3D reconstructions of the world's landmarks from tourist photos. However, a major source

Externí odkaz: http://arxiv.org/abs/2108.05863

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání