This paper presents a novel approach for real-time image editing leveraging few-shot diffusion models. We demonstrate that disentangled controls can be easily achieved in the few-shot diffusion model by conditioning on a detailed text prompt. Our method involves generating a source image by fixing the random seed and utilizing a lengthy text prompt, followed by modifying one attribute in the text prompt to regenerate the target image. We observe that the source and target images are nearly identical, differing only in the modified attribute. Additionally, we introduce an iterative image inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for the correction of the reconstructed image towards the input image. The information of the input image is preserved in the detailed text prompt and four levels of noise maps. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt, resulting in the generation of a new image similar to the input image with only one attribute changed. Furthermore, our method achieves real-time performance, running in milliseconds for both the inversion and editing processes.
Live content is unavailable. Log in and register to view live content