Personalized and Sequential Text-to-Image Generation

Autor:	Nabati, Ofir, Tennenholtz, Guy, Hsu, ChihWei, Ryu, Moonkyung, Ramachandran, Deepak, Chow, Yinlam, Li, Xiang, Boutilier, Craig
Rok vydání:	2024
Předmět:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning Electrical Engineering and Systems Science - Systems and Control
Druh dokumentu:	Working Paper
Popis:	We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest a personalized and diverse slate of prompt expansions to the user. Our Personalized And Sequential Text-to-image Agent (PASTA) extends T2I models with personalized multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also release our sequential rater dataset and simulated user-rater interactions to support future research in personalized, multi-turn T2I generation. Comment: Link to PASTA dataset: https://www.kaggle.com/datasets/googleai/pasta-data
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2412.10419 Zobrazit plný text záznamu View this record from Arxiv