Skip to yearly menu bar Skip to main content


Poster

GTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method

Haoxin Lv · Tianxiong Zhong · Sanyuan Zhao

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Fri 4 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Referring image segmentation (RIS) aims to segment an object of interest by a given natural language expression. As fully-supervised methods require expensive pixel-wise labeling, mask-free solutions supervised by low-cost labels are largely desired. However, existing mask-free methods suffer from complicated architectures or unsatisfying performance. In this paper, we propose a gradient-driven tree-guided mask-free referring image segmentation method, GTMS, which utilizes both low-level structural information and high-level semantic information, while only using a bounding box as the supervised signal. Specifically, we first mind the structural information using a tree filter from the low-level feature. Meanwhile, we explore semantic attention by GradCAM from the high-level feature. Finally, the tree structure and attention information are used to refine the output of the segmentation model to generate pseudo labels, which in turn are used to optimize the model. To verify the effectiveness of our model, the experiments are conducted on three benchmarks, \textit{i.e.}, RefCOCO/+/g. Notably, it achieves 66.54\%, 69.98\%, and 63.41\% IoU on RefCOCO Val-Test, TestA, and TestB, outperforming most of the fully-supervised models.

Live content is unavailable. Log in and register to view live content