Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Qin, Luozheng"'
The quality of video-text pairs fundamentally determines the upper bound of text-to-video models. Currently, the datasets used for training these models suffer from significant shortcomings, including low temporal consistency, poor-quality captions,
Externí odkaz:
http://arxiv.org/abs/2408.02629
The recent advancements in text-to-image generative models have been remarkable. Yet, the field suffers from a lack of evaluation metrics that accurately reflect the performance of these models, particularly lacking fine-grained metrics that can guid
Externí odkaz:
http://arxiv.org/abs/2406.16562
ChatGPT is instruct-tuned to generate general and human-expected content to align with human preference through Reinforcement Learning from Human Feedback (RLHF), meanwhile resulting in generated responses not salient enough. Therefore, in this case,
Externí odkaz:
http://arxiv.org/abs/2406.01070
Autor:
Tan, Zhiyu, Yang, Mengping, Qin, Luozheng, Yang, Hao, Qian, Ye, Zhou, Qiang, Zhang, Cheng, Li, Hao
One critical prerequisite for faithful text-to-image generation is the accurate understanding of text inputs. Existing methods leverage the text encoder of the CLIP model to represent input prompts. However, the pre-trained CLIP model can merely enco
Externí odkaz:
http://arxiv.org/abs/2405.12914