ECCV Poster PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Poster

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Junyi Li · Junfeng Wu · Weizhi Zhao · Song Bai · Xiang Bai

Strong blind review: This paper was not made available on public preprint services during the review process

Strong Double Blind

[ Abstract ] [ Project Page ] [ Paper PDF ]

[ Poster] [ Supplemental]

2024 Poster

Abstract:

We present PartGLEE, a part-level foundation model for locating and identifying both objects and parts in images. Through a unified framework, PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario. Specifically, we propose a Q-Former to construct the hierarchical relationship between objects and parts, parsing every object into corresponding semantic parts. By incorporating a large amount of object-level data, the hierarchical relationships can be extended, enabling PartGLEE to recognize a rich variety of parts. We conduct comprehensive empirical studies to validate the effectiveness of our method, PartGLEE achieves the state-of-the-art performance across various part-level tasks and maintain comparable results on object-level tasks. Our further analysis indicates that the hierarchical cognitive ability of PartGLEE is able to facilitate a detailed comprehension in images for mLLMs. Code will be released.

Chat is not available.