Text-to-image models encounter safety issues, including concerns related to copyright and Not-Safe-For-Work (NSFW) content. Despite several methods have been proposed for erasing inappropriate concepts from diffusion models, they often exhibit incomplete erasure, lack robustness against red-teaming tools, and inadvertently damage generation ability. In this work, we introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model without necessitating additional training. Specifically, RECE efficiently leverages a closed-form solution to compute new embeddings capable of regenerating erased concepts on the model subjected to concept erasure. To mitigate inappropriate content potentially represented by derived embeddings, RECE further projects them onto harmless concepts in cross-attention layers. The generation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts. Besides, to preserve the model's generation ability, RECE introduces an additional regularization term to the closed-form solution, resulting in minimizing the impact on unrelated concepts during the erasure process. Benchmarking against previous approaches, our method achieves more efficient and thorough erasure with small damage to generation ability and demonstrates enhanced robustness against red-teaming tools.
Live content is unavailable. Log in and register to view live content