Despite the recent success of Transformers in point cloud registration, the cross-attention mechanism, while enabling point-wise feature exchange between point clouds, suffers from redundant feature interactions among semantically unrelated regions. Additionally, recent methods rely only on 3D information to extract robust feature representations, while overlooking the rich semantic information in 2D images. In this paper, we propose SemReg, a novel 2D-3D cross-modal framework that exploits semantic information in 2D images to enhance the learning of rich and robust feature representations for point cloud registration. In particular, we design a Gaussian Mixture Semantic Prior that fuses 2D semantic features across RGB frames to reveal semantic correlations between regions across the point cloud pair. Subsequently, we propose the Semantics Guided Feature Interaction module that uses this prior to emphasize the feature interactions between the semantically similar regions while suppressing superfluous interactions during the cross-attention stage. In addition, we design a Semantics Aware Focal Loss that facilitates the learning of robust features, and a Semantics Constrained Matching module that performs matching only between the regions sharing similar semantics. We evaluate our proposed SemReg on the public indoor (3DMatch) and outdoor (KITTI) datasets, and experimental results show that it produces superior registration performance to state-of-the-art techniques.
Live content is unavailable. Log in and register to view live content