Diffusion models achieve great success in generating diverse and high-fidelity images, yet their widespread application, especially in real-time scenarios, is hampered by their inherently slow generation speed. The slow generation stems from the necessity of multi-step network inference. While some certain predictions benefit from the full computation of the model in each sampling iteration, not every iteration requires the same amount of computation, potentially leading to inefficient computation use. Unlike typical adaptive computation challenges that deal with single-step generation problems, diffusion processes with a multi-step generation need to dynamically adjust their computational resource allocation based on the ongoing assessment of each step's importance to the final image output, presenting a unique set of challenges. In this work, we propose AdaDiff, an adaptive computational framework that dynamically allocates computation resources in each sampling step to improve the generation efficiency of diffusion models. To assess the effects of changes in computational effort on image quality, we present a timestep-aware uncertainty estimation module (UEM) designed for diffusion models. Integrated at each intermediate layer, the UEM evaluates the predictive uncertainty of that layer. This uncertainty measurement serves as a crucial indicator for determining whether to early exit the inference process. Additionally, we introduce an uncertainty-aware layer-wise loss mechanism aimed at bridging the performance divide between full models and their early-exited counterparts. Utilizing this loss strategy enables our model to achieve results on par with full-layer models. Comprehensive experiments are conducted, including class-conditional, unconditional, and text-guided image generation across multiple datasets, our approach has demonstrated superior performance and efficiency relative to current early exiting techniques in diffusion models. Notably, we observe enhanced performance in terms of the FID, with a notable acceleration ratio reduction of around 45%. Another exciting observation is that adaptive computation can synergize with other efficiency-enhancing methods, such as reducing sampling steps and weight pruning to further accelerate inference and boost the performance. Full code and model are released for reproduction.
Live content is unavailable. Log in and register to view live content