Skip to yearly menu bar Skip to main content


Poster

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Fanyue Wei · Wei Zeng · Zhenyang Li · Dawei Yin · Lixin Duan · Wen Li

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Fri 4 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using the diffusion-based methods, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based methods usually adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated image and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of generated images. Experimental results on personalized text-to-image generation benchmark datasets show that our proposed approach surpasses existing state-of-the-art methods by a large margin on visual fidelity while preserving the text-alignment.

Live content is unavailable. Log in and register to view live content