Poster
PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
Rishubh Parihar · Sachidanand VS · Sabariswaran Mani · Tejan Karmali · Venkatesh Babu Radhakrishnan
# 315
Strong Double Blind |
[
Poster]
[
Supplemental]
Tue 1 Oct 7:30 a.m. PDT
— 9:30 a.m. PDT
Abstract:
Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing, which is challenging to achieve solely with text prompts. In contrast, StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation. This work uses the disentangled space of StyleGANs to condition the T2I model. This approach allows us to precisely manipulate facial attributes, such as smoothly introducing a smile, while preserving the existing coarse text-based control inherent in T2I models. To enable conditioning of the T2I model on the space, we train a latent mapper to translate latent codes from to the token embedding space of the T2I model. The proposed approach excels in the precise inversion of face images with attribute preservation and facilitates continuous control for fine-grained attribute editing. Furthermore, our approach can be readily extended to generate compositions involving multiple individuals. We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.
Chat is not available.