Segment anything model (SAM) is a promising prompt-guided vision foundation model to segment objects of interest. However, the extensive computational requirements of SAM have limited its applicability in resource-constraint edge devices. Post-training quantization (PTQ) is an effective potential for fast-deploying SAM. Nevertheless, SAM's billion-scale pretraining creates a highly asymmetric activation distribution with detrimental outliers in excessive channels, resulting in significant performance degradation of the low-bit PTQ. In this paper, we propose PQ-SAM, the first PTQ method customized for SAM. To achieve a quantization-friendly tensor-wise distribution, PQ-SAM incorporates a novel grouped activation distribution transformation (GADT) based on a two-stage outlier hierarchical clustering (OHC) scheme to scale and shift each channel. Firstly, OHC identifies and truncates extreme outliers to reduce the scale variance of different channels. Secondly, OHC iteratively allocates learnable shifting and scaling sizes to each group of channels with similar distributions, reducing the number of learnable parameters and easing the optimization difficulty. These shifting and scaling sizes are used to adjust activation channels, and jointly optimized with quantization step sizes for optimal results. Extensive experiments demonstrate that PQ-SAM outperforms existing PTQ methods on nine zero-shot datasets, and pushes the 4-bit PTQ of SAM to a usable level.
Live content is unavailable. Log in and register to view live content