Segmenting and recognizing a diverse range of object parts is crucial in various computer vision and robotic applications. While object segmentation has made significant progress, part-level segmentation remains an under-explored issue. Part segmentation entails discerning complex boundaries between parts, and the scarcity of annotated data further complicates the task. To tackle this problem, in this paper, we propose a novel Weakly-supervised Part Segmentation (WPS) setting and an approach called WPS-SAM, built on the large-scale pre-trained vision foundation model, Segment Anything Model (SAM). WPS-SAM is an end-to-end framework designed to extract prompt tokens directly from images and perform pixel-level segmentation of part regions. During its training phase, it only utilizes weakly supervised labels in the form of bounding boxes or points. Extensive experiments demonstrate that, through exploiting the rich knowledge embedded in pre-trained foundation models, WPS-SAM outperforms other segmentation models trained with pixel-level strong annotations. Specifically, WPS-SAM achieves 68.93% mIOU and 79.53% mACC on the PartImageNet dataset, surpassing state-of-the-art fully supervised methods by approximately 4% in terms of mIOU.
Live content is unavailable. Log in and register to view live content