Reprojection consistency is widely used for self-supervised 3D human pose estimation. However, few efforts have been made to address the inherent limitations of reprojection consistency. Lacking camera parameters and absolute position, self-supervised methods map 3D poses to 2D using orthographic projection, whereas 2D inputs are derived from perspective projection. This discrepancy among the projection models creates an offset between the supervision and prediction spaces, limiting the performance potential. To address this problem, we propose rotated orthographic projection, which achieves a geometric approximation of the perspective projection by adding a rotation operation before the orthographic projection. Further, we optimize the reference point selection according to the human body structure and propose group rotated orthogonal projection, which significantly narrows the gap between the two projection models. Meanwhile, the reprojection consistency loss fails to constrain the Z-axis reverse wrong pose in 3D space. Therefore, We introduce the joint reverse constraint to limit the range of angles between the local reference plane and the end joints, penalizing unrealistic 3D poses and clarifying the Z-axis orientation of the model. The proposed method achieves state-of-the-art (SOTA) performance on both Human3.6M and MPII-INF-3DHP datasets. Particularly, it reduces the mean error from 65.9mm to 42.9mm (34.9% improvement) over the SOTA self-supervised method on Human3.6M.
Live content is unavailable. Log in and register to view live content