The imperative for smart surveillance systems to robustly detect anomalies poses a unique challenge given the sensitivity of visual data and privacy concerns. We propose a novel Federated Learning framework for Video Anomaly Detection that operates under the constraints of data heterogeneity and privacy preservation. We utilize Federated Visual Consistency Clustering to group clients on the server side. Further innovation is realized with an Adaptive Semantic-Enhanced Distillation strategy that infuses public video knowledge into our framework. During this process, Large Language Models are utilized for semantic generation and calibration of public videos. These video-text pairs are then used to fine-tune a multimodal network, which serves as a teacher in updating the global model. This approach not only refines video representations but also increases sensitivity to anomalous events. Our extensive evaluations showcase FedVAD's proficiency in boosting unsupervised and weakly supervised anomaly detection, rivaling centralized training paradigms while preserving privacy. The code will be made available publicly at https://anonymous.4open.science/r/FedVAD-BF51.
Live content is unavailable. Log in and register to view live content