Pre-trained models are widely used in machine learning (ML) due to the minimal demand for computational resources and training data. Recent studies show that the pre-trained model are vulnerable to backdoor attacks. Additionally, prior studies on hardware security have indicated that ML systems could potentially be compromised through bit flip attacks using Rowhammer. In this paper, we introduce \textbf{WBP} (i.e., weight bit poisoning), a novel attack framework that allows an attacker to implant a task-agnostic backdoor into the victim model \emph{during the fine-tuning process} through limited \emph{weight bit flips}. Notably, WBP aims to directly maximize the distance of output representations for normal and triggered inputs. We evaluate WBP on state-of-the-art CNNs and Vision Transformer models with a variety of downstream tasks. Our experimental results demonstrate that, without any prior knowledge of fine-tuning datasets, WBP can compromise a wide range of downstream tasks with a 99.3% attack success rate on average by flipping as few as 11 bits among millions of parameters.
Live content is unavailable. Log in and register to view live content