We present CLIP-IFDL, a novel image forgery detection and localization (IFDL) model that harnesses the power of Contrastive Language Image Pre-Training (CLIP). However, directly incorporating CLIP in forgery detection poses challenges, given its lack of specific prompts and forgery consciousness. To overcome these challenges, we tailor the CLIP model for forgery detection and localization leveraging a noise-assisted prompt learning framework. This framework comprises instance-aware dual-stream prompt learning and a forgery-enhanced noise adapter. We initially create a pair of learnable prompts as negative-positive samples in place of discrete prompts, then fine-tune these prompts based on each image's features and categories. Additionally, we constrain the text-image similarity between the prompts and their corresponding images to update the prompts. Moreover, We design a forgery-enhanced noise adapter that augments the image encoder's forgery perceptual ability via multi-domain fusion and zero linear layers. By doing so, our method not only extracts pertinent features but also benefits from the generalizability of the open-world CLIP prior. Comprehensive tests indicate that our method outperforms existing ones in terms of accuracy and generalizability while effectively reducing false alarms.
Live content is unavailable. Log in and register to view live content