Category-level articulated object pose estimation task focuses on the pose estimation of unknown articulated objects within known categories. Despite its significance, this task remains challenging due to the varying shapes and poses of objects, expensive dataset annotation costs, and complex real-world environments. In this paper, we propose a novel self-supervised approach that leverages a single-frame point cloud to accomplish the aforementioned task. Our model consistently generates canonical reconstruction with a canonical pose and joint state for the entire input object, and it estimates object-level pose that reduce overall pose variance and part-level poses that align each part of the input with its corresponding part of the reconstruction. Experimental results demonstrate our approach achieves state-of-the-art performance to other self-supervised methods and comparable performance to other supervised methods. To assess the performance of our model in real-world scenarios, we also introduce a novel real-world articulated object benchmark dataset.
Live content is unavailable. Log in and register to view live content