Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermediate layers' features, and logits-based, targeting the final layer's logits. This paper introduces a novel perspective by leveraging diverse knowledge sources within a unified KD framework. Specifically, we aggregate features from intermediate layers into a comprehensive representation, efficiently capturing essential knowledge without redundancy. Subsequently, we predict the distribution parameters from this representation. These steps transform knowledge from the intermediate layers into corresponding distributive forms, which are then conveyed through a unified distillation framework. Numerous experiments were conducted to validate the effectiveness of the proposed method. Remarkably, the distilled student network not only significantly outperformed its original counterpart but also, in many cases, surpassed the teacher network.
Live content is unavailable. Log in and register to view live content