Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks for various computer vision tasks, which however requires a high computational cost and inference latency. Recently, post-training quantization (PTQ) has emerged as a promising way to promote the efficiency of Vision Transformers. Nevertheless, existing PTQ approaches for ViTs suffer from the inflexible quantization on the post-Softmax and post-GeLU activations with power-law-like distributions. To address the issue, we propose a novel non-uniform quantizer dubbed AdaLog, which adapts the logarithmic base to accommodate the power-law-like distribution of activations, and simultaneously allows for hardware-friendly quantization and de-quantization. By further employing the bias reparameterization, AdaLog Quantizer is applicable to both the post-Softmax and post-GeLU activations. Moreover, we develop a Beam-Search strategy to select the best logarithm base for AdaLog quantizers, as well as those scaling factors and zero points for uniform quantizers. Extensive experimental results on public benchmarks demonstrate the effectiveness of our approach for various ViT-based architectures and vision tasks such as classification, object detection, and instance segmentation.
Live content is unavailable. Log in and register to view live content