Skip to yearly menu bar Skip to main content


Poster

4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation

Feng Cheng · Mi Luo · Huiyu Wang · Alex Dimakis · Lorenzo Torresani · Gedas Bertasius · Kristen Grauman

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Thu 3 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

We present 4Diff, a 3D-aware diffusion model addressing the exo-to-ego viewpoint translation problem. This task involves generating first-person (egocentric) view images from third-person (exocentric) images. Leveraging the diffusion model's ability to generate photorealistic images, we propose a transformer-based diffusion model that incorporates geometry priors via the proposed mechanisms: (i) egocentric prior rendering and (ii) 3D-aware rotary cross-attention. The former integrates egocentric layout cues through point cloud rasterization, while the latter incorporates exocentric semantic features by guiding attention between diffusion model feature maps and exocentric semantic features, considering their geometric relationships. Our experiments on the challenging and diverse Ego-Exo4D multiview dataset demonstrate superior performance compared to state-of-the-art approaches. Notably, our approach exhibits robust generalization to novel environments not encountered during training. The code and pretrained models will be made public.

Live content is unavailable. Log in and register to view live content