Skip to yearly menu bar Skip to main content


Poster

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

Chao Gong · Kai Chen · Zhipeng Wei · Jingjing Chen · Yu-Gang Jiang

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Text-to-image models encounter safety issues, including concerns related to copyright and Not-Safe-For-Work (NSFW) content. Despite several methods have been proposed for erasing inappropriate concepts from diffusion models, they often exhibit incomplete erasure, lack robustness against red-teaming tools, and inadvertently damage generation ability. In this work, we introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model without necessitating additional training. Specifically, RECE efficiently leverages a closed-form solution to compute new embeddings capable of regenerating erased concepts on the model subjected to concept erasure. To mitigate inappropriate content potentially represented by derived embeddings, RECE further projects them onto harmless concepts in cross-attention layers. The generation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts. Besides, to preserve the model's generation ability, RECE introduces an additional regularization term to the closed-form solution, resulting in minimizing the impact on unrelated concepts during the erasure process. Benchmarking against previous approaches, our method achieves more efficient and thorough erasure with small damage to generation ability and demonstrates enhanced robustness against red-teaming tools.

Live content is unavailable. Log in and register to view live content