Poster

VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network

Zhixue Fang · Yuzhi Liu · Huisi Wu · Jing Qin

Strong blind review: This paper was not made available on public preprint services during the review process

Strong Double Blind

2024 Poster

Project Page Paper PDF [ Slides] [ Poster]

Abstract

We propose a novel model (VP-SAM) adapted from segment anything model (SAM) for video polyp segmentation (VPS), which is a very challenging task due to (1) the low contrast between polyps and background and (2) the large frame-to-frame variations of polyp size, position, and shape. Our aim is to take advantage of the powerful representation capability of SAM while enabling SAM to effectively harness temporal information of colonoscopic videos and disentangle polyps from background with quite similar appearances. To achieve this, we propose two new techniques. First, we propose a new semantic disentanglement adapter (SDA) by exploiting amplitude information of the Fourier spectrum to facilitate SAM in more effectively differentiating polyps from background. Second, we propose an innovative spatio-temporal side network (STSN) to provide SAM with spatio-temporal information of videos, thus facilitating SAM in effectively tracking the motion status of polyps. Extensive experiments on SUN-SEG, CVC-612, and CVC-300 demonstrate that our method significantly outperforms state-of-the-art methods. While this work focuses on colonoscopic videos, the proposed method is general enough to be used to analyze other medical videos with similar challenges. Codes will be released upon publication.

Chat is not available.