To bridge the gap between point labels and per-pixel labels, existing point-supervised panoptic segmentation methods usually estimate dense pseudo labels by assigning unlabeled pixels to corresponding instances according to rule-based pixel-to-instance distances. These rule-based distances involve the Dijkstra algorithm and cannot be optimized by point labels end to end, thus the distance results are usually suboptimal, which results in inaccurate pseudo labels. Here we propose to assign unlabeled pixels to corresponding instances based on a learnable distance metric. Specifically, we represent each instance as an anchor query, then predict the pixel-to-instance distance based on the cross-attention between anchor queries and pixel features through a distance branch, the predicted distance is supervised by point labels end to end. In order that each query can accurately represent the corresponding instance, we iteratively improve anchor queries through query aggregating and query enhancing processes, then improved distance results are predicted with these queries. We have experimentally demonstrated the effectiveness of our approach and achieved state-of-the-art results. Codes will be released upon acceptance.
Live content is unavailable. Log in and register to view live content