We present SegGen, a new data generation approach that pushes the performance boundaries of state-of-the-art image segmentation models. One major bottleneck of previous data synthesis methods for segmentation is the design of segmentation labeler module'', which is used to synthesize segmentation masks for images. The segmentation labeler modules, which are segmentation models by themselves, bound the performance of downstream segmentation models trained on the synthetic masks. These methods encounter a
chicken or egg dilemma'' and thus fail to outperform existing segmentation models. To address this issue, we propose a novel method that reverses the traditional data generation process: we first (i) generate highly diverse segmentation masks that match real-world distribution from text prompts, and then (ii) synthesize realistic images conditioned on the segmentation masks. In this way, we avoid the need for any segmentation labeler module. SegGen integrates two data generation strategies, namely MaskSyn and ImgSyn, to largely improve data diversity in synthetic masks and images. Notably, the high quality of our synthetic data enables our method to outperform the previous data synthesis method by +25.2 mIoU on ADE20K when trained with pure synthetic data. On the highly competitive ADE20K and COCO benchmarks, our data generation method markedly improves the performance of state-of-the-art segmentation models in semantic segmentation, panoptic segmentation, and instance segmentation. Moreover, experiments show that training with our synthetic data makes the segmentation models more robust towards unseen data domains, including real-world and AI-generated images.
Live content is unavailable. Log in and register to view live content