Robust fine-tuning aims to adapt a vision-language model to downstream tasks while preserving its zero-shot capabilities on unseen data. Recent studies have introduced fine-tuning strategies to improve in-distribution (ID) performance on the downstream tasks while minimizing deterioration in out-of-distribution (OOD) performance on unseen data. This balance is achieved either by aligning the fine-tuned representations with the pre-trained ones or by constraining significant deviations in fine-tuned weights compared to the pre-trained model. In the latter approach, the regularization term is uniformly applied to all parameters. Our work proposes to selectively apply the regularization term based on the ``importance'' of each neuron to the fine-tuning dataset. To this end, we develop an importance-score metric to quantify each neurons’ importance to the downstream task and then leverage this to develop two fine-tuning strategies: importance-guided selective fine-tuning and importance-guided regularization. Our approach can be used concurrently with representation space-based methods, outperforming other approaches based on parameter space. We improve the state-of-the-art on standard robust fine-tuning benchmarks across datasets in both the full-shot and low-shot settings.
Live content is unavailable. Log in and register to view live content