Object detection in aerial images is a longstanding yet challenging task. Despite the significant advancements in recent years, most works still show unsatisfactory performance due to the scale variation of objects. A standard strategy to address this problem is multi-scale training, aiming at learning scale-invariant feature representations. Albeit achieving inspiring improvements, such a multi-scale strategy is impractical for real application as inference time increases considerably. Besides, the original images are resized to different scales and subsequently trained separately, lacking information interaction across multi-scales. In this paper, we present a novel method called multi-scale cross distillation (MSCD) to address the above-mentioned issues. MSCD combines the merits of multi-scale training and knowledge distillation, enabling single-scale inference to achieve comparable or superior performance than multi-scale inference. Specifically, we first construct a parallel multi-branch architecture, in which each branch shares the same parameters yet takes images with different scales as input. Furthermore, we design an adaptive cross-scale distillation module that adaptively integrates the knowledge of different branches into a single one. Thus, the detectors trained with MSCD only require single-scale inference. Extensive experiments demonstrate the effectiveness of MSCD. Without bells and whistles, MSCD can facilitate prevalent two-stage detectors to outperform corresponding single-scale models by ~5 mAP and ~7 mAP improvement on DOTA and DIOR-R datasets, respectively.
Live content is unavailable. Log in and register to view live content