Utilizing a unified model to detect multi-class anomalies is a promising solution to real-world anomaly detection. Despite their appeal, such models typically suffer from large model parameters and thus pose a challenge to their deployment on memory-constrained embedding devices. To address this challenge, this paper proposes a novel ViT-style multi-class detection approach named MoEAD, which can reduce the model size while simultaneously maintaining its detection performance. Our key insight is that the FFN layers within each stacked block (i.e., transformer blocks in ViT) mainly characterize the unique representations in these blocks, while the remaining components exhibit similar behaviors across different blocks. The finding motivates us to squeeze traditional stacked transformed blocks from N to a single block, and then incorporate Mixture of Experts (MoE) technology to adaptively select the FFN layer from an expert pool in every recursive round. This allows MoEAD to capture anomaly semantics step-by-step like ViT and choose the optimal representations for distinct class anomaly semantics, even though it shares parameters in all blocks with only one. Extensive experiments show that, compared to the state-of-the-art (SOTA) anomaly detection methods, MoEAD achieves a desirable trade-off between performance and memory consumption. It not only employs the smallest model parameters, has the fastest inference speed, but also obtains competitive detection performance.
Live content is unavailable. Log in and register to view live content