Poster
VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network
Zhixue Fang · Yuzhi Liu · Huisi Wu · Jing Qin
# 168
Strong Double Blind |
We propose a novel model (VP-SAM) adapted from segment anything model (SAM) for video polyp segmentation (VPS), which is a very challenging task due to (1) the low contrast between polyps and background and (2) the large frame-to-frame variations of polyp size, position, and shape. Our aim is to take advantage of the powerful representation capability of SAM while enabling SAM to effectively harness temporal information of colonoscopic videos and disentangle polyps from background with quite similar appearances. To achieve this, we propose two new techniques. First, we propose a new semantic disentanglement adapter (SDA) by exploiting amplitude information of the Fourier spectrum to facilitate SAM in more effectively differentiating polyps from background. Second, we propose an innovative spatio-temporal side network (STSN) to provide SAM with spatio-temporal information of videos, thus facilitating SAM in effectively tracking the motion status of polyps. Extensive experiments on SUN-SEG, CVC-612, and CVC-300 demonstrate that our method significantly outperforms state-of-the-art methods. While this work focuses on colonoscopic videos, the proposed method is general enough to be used to analyze other medical videos with similar challenges. Codes will be released upon publication.
Live content is unavailable. Log in and register to view live content