The scene graph generation (SGG) task involves detecting objects within an image and predicting predicates that represent the relationships between the objects. However, as each subject-object pair in SGG benchmark datasets is annotated with a single predicates even though a single predicate may exhibit diverse semantics (i.e., semantic diversity), existing SGG models are trained to predict the one and only predicate annotated between each subject-object pair. This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate, thus leading to biased predictions. In this paper, we propose a novel model-agnostic Semantic Diversity-aware Prototype-based Learning (DPL) framework that enables unbiased predictions based on the understanding of the semantic diversity of predicates. Specifically, DPL learns the regions in the semantic space covered by each predicate to distinguish among the various different semantics that a single predicate can represent. Extensive experiments demonstrate that our proposed model-agnostic DPL framework brings significant performance improvement on existing SGG models, and also effectively understands the semantic diversity of predicates.
Live content is unavailable. Log in and register to view live content