In this study, we investigate the task of active testing for label-efficient evaluation, which aims to estimate a model's performance on an unlabeled test dataset with a limited annotation budget. Previous approaches relied on deep ensemble models to identify highly informative instances for labeling, but fell short in dense recognition tasks like segmentation and object detection due to their high computational costs. In this work, we present MetaAT, a simple yet effective approach that adapts a Vision Transformer as a Meta Model for active testing. Specifically, we introduce a region loss estimation head to identify challenging regions for more accurate and informative instance acquisition. More importantly, the design of MetaAT allows it to handle annotation granularity at the region level, significantly reducing annotation costs in dense recognition tasks. As a result, our approach demonstrates consistent and substantial performance improvements over five popular benchmarks compared with state-of-the-art methods. Notably, on the CityScapes dataset, MetaAT achieves a 1.36% error rate in performance estimation using only 0.07% of annotations, marking a 10X improvement over existing state-of-the-art methods. To the best of our knowledge, MetaAT represents the first framework for active testing of dense recognition tasks.
Live content is unavailable. Log in and register to view live content