In this paper, we focus on the problem of detecting samples that can lead to model failure under the classification setting. Failures can stem from various sources, such as spurious correlations between image features and labels, class imbalances in the training data, and covariate shifts between training and test distributions. Existing approaches often rely on classifier prediction scores and do not comprehensively identify all failure scenarios. Instead, we pose failure detection as the problem of identifying the discrepancies between the classifier and its enhanced version. We build such an enhanced model by infusing task-agnostic prior knowledge from a vision-language model (e.g., CLIP) that encodes general-purpose visual and semantic relationships. Unlike conventional training, our enhanced model, named the Prior Induced Model (PIM) learns to map the pre-trained model features to the VLM latent space and aligns the same with a set of pre-specified, fine-grained class-level attributes which are later aggregated to estimate the class prediction. We propose that such a training strategy allows the model to concentrate only on the task specific attributes while making predictions in lieu of the pre-trained model and also enables human-interpretable explanations for failure. We conduct extensive empirical studies on various benchmark datasets and baselines, observing substantial improvements in failure detection.
Live content is unavailable. Log in and register to view live content