Skip to yearly menu bar Skip to main content


Poster

3D Hand Pose Estimation in Everyday Egocentric Images

Aditya Prakash · Ruisen Tu · Matthew Chang · Saurabh Gupta

[ ]
Fri 4 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

3D hand pose estimation in everyday egocentric images is challenging for several reasons: poor visual signal (occlusion from the object of interaction, low resolution & motion blur), large perspective distortion (hands are close to the camera), and lack of 3D annotations outside of controlled settings. While existing methods often use hand crops as input to focus on fine-grained visual information to deal with poor visual signal, the challenges arising from perspective distortion and lack of 3D annotations in the wild have not been systematically studied. We focus on this gap and explore the impact of different practices, i.e. crops as input, incorporating camera information, auxiliary supervision, scaling up datasets. Based on our findings, we present WildHands, a system for 3D hand pose estimation in everyday egocentric images. Zero-shot evaluation on four diverse datasets (H2O, Assembly, Epic, and EgoExo4D) demonstrate the effectiveness of our approach across 2D and 3D metrics, where we beat past methods by 7.4% -- 66%. In system level comparisons, WildHands achieves the best 3D hand pose score on the egocentric split of the ARCTIC, beats the popular FrankMocap system and is competitive the concurrent HaMeR system.

Live content is unavailable. Log in and register to view live content