Skip to yearly menu bar Skip to main content


Poster

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Tianchen Zhao · Xuefei Ning · Tongcheng Fang · Enshu Liu · Guyue Huang · Zinan Lin · Shengen Yan · Guohao Dai · Yu Wang

[ ]
Fri 4 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Few-step diffusion models, which enable high-quality text-to-image generation with only a few denoising steps, have substantially reduced inference time. However, considerable memory consumption (5-10GB) still poses limitations for practical deployment on mobile devices. Post-Training Quantization (PTQ) proves to be an effective method for enhancing efficiency in both memory and operational complexity. However, when applied to few-step diffusion models, existing methods designed for multi-step diffusion face challenges in preserving both visual quality and text alignment. In this paper, we discover that the quantization is bottlenecked by highly sensitive layers. Consequently, we introduce a mixed-precision quantization method: MixDQ. Firstly, we identify some highly sensitive layers are caused by outliers in text embeddings, and design a specialized Begin-Of-Sentence (BOS)-aware quantization to address this issue. Subsequently, we investigate the drawback of existing sensitivity metrics, and introduce metric-decoupled sensitivity analysis to accurately estimate sensitivity for both image quality and content. Finally, we develop an integer-programming-based method to obtain the optimal mixed-precision configuration. In the challenging 1-step Stable Diffusion XL text-to-image task, current quantization methods fall short at W8A8. Remarkably, MixDQ achieves W3.6A16 and W4A8 quantization with negligible degradation in both visual quality and text alignment. Compared with FP16, it achieves 3-4x reduction in model size and memory costs, along with a 1.5x latency speedup.

Live content is unavailable. Log in and register to view live content