Text-to-Image (T2I) Diffusion Models (DMs) excel at creating high-quality images from text descriptions but, like many deep learning models, suffer from robustness issues. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is whenever an adversarial example (AE) can be found. In this study, we first formalise a probabilistic notion of T2I DMs' robustness; and then devise an efficient framework, ProTIP, to evaluate it with statistical guarantees. The main challenges stem from: i) the high computational cost of the image generation process; and ii)identifying if a perturbed input is an AE involves comparing two output distributions, which is fundamentally harder compared to other DL tasks like classification where an AE is identified upon misprediction of labels. To tackle the challenges, we employ sequential analysis with efficacy and futility early stopping rules in the statistical testing for identifying AEs, and adaptive concentration inequalities to dynamically determine the “just-right” number of stochastic perturbations whenever the verification target is met. Empirical experiments validate ProTIP's effectiveness and efficiency, and showcase its application in ranking common defence methods.
Live content is unavailable. Log in and register to view live content